Skip to content

Understanding Aeron Cluster Counters

On top of the specific counters for the streams that make up the communications for Aeron Cluster, there are a number of cluster specific counters:

Counter Description
Cluster Errors - clusterId=0 The number of errors raised (for clusterId=0 in this case). Use Aeron Cluster Tool to view errors; also see Cluster Errors
Consensus Module state The current state of the Consensus Module. See below
Cluster election state - clusterId=0 The current election state. See Election States
Cluster node role - clusterId=0 The current cluster node role. See below
Cluster commit-pos: - clusterId=0 The current commit-position of the cluster
Cluster snapshot count - clusterId=0 How many snapshots have been run on clusterId=0
Cluster timed out client count - clusterId=0 How many cluster clients have timed out on clusterId=0
Cluster Container Errors - clusterId=0 serviceId=0 How many cluster container errors have been raised for clusterId=0 and serviceId=0

Consensus Module State

Value Description
0 Initializing - Starting & Recovering
1 Active - cluster ingress and/or expired timers are appended to the log
2 Suspended - cluster ingress and expired timers are not being processed
3 Snapshot - cluster is busy taking a snapshot
4 Quitting - cluster quitting as soon as services acknowledge, without snapshot
5 Terminating - node is terminating
6 Closed - terminal state

Cluster Node Role

Value Description
0 Follower - Node is a follower in the current leadership term
1 Candidate to become Leader in the current leadership term
2 Leader - Node is the leader in the current leadership term

This can be overwhelming in raw AeronStat output, especially once you have many clients connected to a multi-node cluster. It helps to make use of filters in AeronStat, for example by stream.

Cluster Streams

Note

The defaults given below can be overridden.

The easiest way to understand which each stream is doing what is to trace the stream ID in the AeronStat output.

Stream Id Description
10 Aeron Archive Control Request Channel
20 Aeron Archive Control Response Channel
100 Log Stream
101 Cluster Ingress (clients sending to the Cluster's Consensus Module)
102 Cluster Egress (cluster leader sending to cluster clients)
103 Snapshot Replay Stream
104 Clustered Service Control Stream
105 Consensus Module Stream
106 Clustered Service Snapshot Stream
107 Consensus Module Snapshot Stream
108 Consensus Stream (between cluster nodes)

Each stream can be monitored independently via AeronStat.

See also:

Sample

A sample rendering of the AeronStat output from one of the Aeron Cluster samples in the cookbook, showing how the streams in a simple cluster and cluster client interact:

Rendering of AeronStat output

Cluster

The sample below - which is nearly 100 lines in length - is for a single node cluster with a single client.

15:00:12 - Aeron Stat (CnC v0.2.0), pid 75364, heartbeat age 892ms
======================================================================
  0:                2,112 - Bytes sent
  1:                  640 - Bytes received
  2:                    0 - Failed offers to ReceiverProxy
  3:                    0 - Failed offers to SenderProxy
  4:                    0 - Failed offers to DriverConductorProxy
  5:                    0 - NAKs sent
  6:                    0 - NAKs received
  7:                   51 - Status Messages sent
  8:                    1 - Status Messages received
  9:                   51 - Heartbeats sent
 10:                    1 - Heartbeats received
 11:                    0 - Retransmits sent
 12:                    0 - Flow control under runs
 13:                    0 - Flow control over runs
 14:                    0 - Invalid packets
 15:                    0 - Errors
 16:                    0 - Short sends
 17:                    0 - Failed attempts to free log buffers
 18:                    0 - Sender flow control limits, i.e. back-pressure events
 19:                    0 - Unblocked Publications
 20:                    0 - Unblocked Control Commands
 21:                    0 - Possible TTL Asymmetry
 22:                    0 - ControllableIdleStrategy status
 23:                    0 - Loss gap fills
 24:                    0 - Client liveness timeouts
 25:                    0 - Resolution changes: driverName= hostname=Hurricane.local
 26:            1,824,439 - Conductor max cycle time doing its work (ns)
 27:                    0 - Conductor work cycle exceeded threshold count
 28:    1,618,686,012,247 - client-heartbeat: 1
 29:                    1 - Archive Control Sessions
 30:                    1 - rcv-channel: aeron:udp?endpoint=localhost:9001|sparse=true 127.0.0.1:9001
 31:                    1 - rcv-local-sockaddr: 30 127.0.0.1:9001
 32:    1,618,686,011,806 - client-heartbeat: 7
 33:                    0 - Cluster Errors - clusterId=0
 34:                    1 - Consensus Module state - clusterId=0
 35:                   17 - Cluster election state - clusterId=0
 36:                    2 - Cluster node role - clusterId=0
 37:                3,552 - Cluster commit-pos: - clusterId=0
 38:                    1 - Cluster control toggle - clusterId=0
 39:                    0 - Cluster snapshot count - clusterId=0
 40:                    0 - Cluster timed out client count - clusterId=0
 41:                    1 - rcv-channel: aeron:udp?term-length=64k|endpoint=localhost:9003 127.0.0.1:9003
 42:                    1 - rcv-local-sockaddr: 41 127.0.0.1:9003
 43:                  288 - pub-pos (sampled): 18 -74825454 104 aeron:ipc?term-length=128k
 44:               65,536 - pub-lmt: 18 -74825454 104 aeron:ipc?term-length=128k
 45:                  640 - pub-pos (sampled): 20 -74825453 10 aeron:ipc?sparse=true|term-length=64k|mtu=1408
 46:               32,768 - pub-lmt: 20 -74825453 10 aeron:ipc?sparse=true|term-length=64k|mtu=1408
 47:                  640 - sub-pos: 6 -74825453 10 aeron:ipc?term-length=64k @0
 48:    1,618,686,011,897 - client-heartbeat: 21
 49:                    0 - Cluster Container Errors - clusterId=0 serviceId=0
 50:                1,056 - pub-pos (sampled): 24 -74825452 20 aeron:ipc?mtu=1408|term-length=65536|sparse=true
 51:               32,768 - pub-lmt: 24 -74825452 20 aeron:ipc?mtu=1408|term-length=65536|sparse=true
 52:                1,056 - sub-pos: 19 -74825452 20 aeron:ipc?sparse=true|term-length=64k|mtu=1408 @0
 53:                  800 - pub-pos (sampled): 25 -74825451 105 aeron:ipc?term-length=128k
 54:               65,536 - pub-lmt: 25 -74825451 105 aeron:ipc?term-length=128k
 55:                  800 - sub-pos: 17 -74825451 105 aeron:ipc?term-length=128k @0
 56:                  288 - sub-pos: 26 -74825454 104 aeron:ipc?term-length=128k @0
 58:                    1 - snd-channel: aeron:udp?init-term-id=-1570082996|fc=min,t:10s|term-id=-1570082996|tags=34,33|term-offset=2624|control-mode=manual|alias=log|mtu=1408|term-length=67108864|ssc=true 0.0.0.0:58337
 59:                    1 - snd-local-sockaddr: 58 0.0.0.0:58337
 60:                3,552 - pub-pos (sampled): 35 -74825450 100 aeron:udp?init-term-id=-1570082996|fc=min,t:10s|term-id=-1570082996|tags=34,33|term-offset=2624|control-mode=manual|alias=log|mtu=1408|term-length=67108864|ssc=true
 61:           33,557,984 - pub-lmt: 35 -74825450 100 aeron:udp?init-term-id=-1570082996|fc=min,t:10s|term-id=-1570082996|tags=34,33|term-offset=2624|control-mode=manual|alias=log|mtu=1408|term-length=67108864|ssc=true
 62:                3,552 - snd-pos: 35 -74825450 100 aeron:udp?init-term-id=-1570082996|fc=min,t:10s|term-id=-1570082996|tags=34,33|term-offset=2624|control-mode=manual|alias=log|mtu=1408|term-length=67108864|ssc=true
 63:                3,552 - snd-lmt: 35 -74825450 100 aeron:udp?init-term-id=-1570082996|fc=min,t:10s|term-id=-1570082996|tags=34,33|term-offset=2624|control-mode=manual|alias=log|mtu=1408|term-length=67108864|ssc=true
 64:                    0 - snd-bpe: 35 -74825450 100 aeron:udp?init-term-id=-1570082996|fc=min,t:10s|term-id=-1570082996|tags=34,33|term-offset=2624|control-mode=manual|alias=log|mtu=1408|term-length=67108864|ssc=true
 69:                3,552 - sub-pos: 43 -74825450 100 aeron-spy:aeron:udp?tags=34|session-id=-74825450|alias=log @2624
 70:                3,552 - rec-pos: 0 -74825450 100 aeron:udp?tags=34|session-id=-74825450|alias=log
 71:                3,552 - sub-pos: 46 -74825450 100 aeron-spy:aeron:udp?tags=34|session-id=-74825450|alias=log @2624
 72:                    1 - rcv-channel: aeron:udp?endpoint=localhost:9010 127.0.0.1:9010
 73:                    1 - rcv-local-sockaddr: 72 127.0.0.1:9010
--

Cluster Client

15:03:49 - Aeron Stat (CnC v0.2.0), pid 77779, heartbeat age 987ms
======================================================================
  0:                4,000 - Bytes sent
  1:                2,208 - Bytes received
  2:                    0 - Failed offers to ReceiverProxy
  3:                    0 - Failed offers to SenderProxy
  4:                    0 - Failed offers to DriverConductorProxy
  5:                    0 - NAKs sent
  6:                    0 - NAKs received
  7:                   27 - Status Messages sent
  8:                   30 - Status Messages received
  9:                  108 - Heartbeats sent
 10:                   51 - Heartbeats received
 11:                    0 - Retransmits sent
 12:                    0 - Flow control under runs
 13:                    0 - Flow control over runs
 14:                    0 - Invalid packets
 15:                    0 - Errors
 16:                    0 - Short sends
 17:                    0 - Failed attempts to free log buffers
 18:                    0 - Sender flow control limits, i.e. back-pressure events
 19:                    0 - Unblocked Publications
 20:                    0 - Unblocked Control Commands
 21:                    0 - Possible TTL Asymmetry
 22:                    0 - ControllableIdleStrategy status
 23:                    0 - Loss gap fills
 24:                    0 - Client liveness timeouts
 25:                    0 - Resolution changes: driverName= hostname=Hurricane.fios-router.home
 26:            4,186,683 - Conductor max cycle time doing its work (ns)
 27:                    0 - Conductor work cycle exceeded threshold count
 28:    1,609,475,953,178 - client-heartbeat: 1
 29:                    1 - snd-channel: aeron:udp?endpoint=localhost:9003 127.0.0.1:50835
 30:                    1 - snd-local-sockaddr: 29 127.0.0.1:50835
 31:            3,889,568 - pub-pos (sampled): 3 -1365430415 101 aeron:udp?endpoint=localhost:9003
 32:           12,294,560 - pub-lmt: 3 -1365430415 101 aeron:udp?endpoint=localhost:9003
 33:            3,905,952 - snd-pos: 3 -1365430415 101 aeron:udp?endpoint=localhost:9003
 34:            4,032,672 - snd-lmt: 3 -1365430415 101 aeron:udp?endpoint=localhost:9003
 35:                    0 - snd-bpe: 3 -1365430415 101 aeron:udp?endpoint=localhost:9003
 36:                9,792 - sub-pos: 2 729594557 102 aeron:udp?endpoint=localhost:19000 @0
 37:                9,792 - rcv-hwm: 5 729594557 102 aeron:udp?endpoint=localhost:19000
 38:                9,792 - rcv-pos: 5 729594557 102 aeron:udp?endpoint=localhost:19000
--