Understanding Aeron Cluster Counters¶
On top of the specific counters for the streams that make up the communications for Aeron Cluster, there are a number of cluster specific counters:
Counter | Description |
---|---|
Cluster Errors - clusterId=0 | The number of errors raised (for clusterId=0 in this case). Use Aeron Cluster Tool to view errors; also see Cluster Errors |
Consensus Module state | The current state of the Consensus Module. See below |
Cluster election state - clusterId=0 | The current election state. See Election States |
Cluster node role - clusterId=0 | The current cluster node role. See below |
Cluster commit-pos: - clusterId=0 | The current commit-position of the cluster |
Cluster snapshot count - clusterId=0 | How many snapshots have been run on clusterId=0 |
Cluster timed out client count - clusterId=0 | How many cluster clients have timed out on clusterId=0 |
Cluster Container Errors - clusterId=0 serviceId=0 | How many cluster container errors have been raised for clusterId=0 and serviceId=0 |
Consensus Module State¶
Value | Description |
---|---|
0 | Initializing - Starting & Recovering |
1 | Active - cluster ingress and/or expired timers are appended to the log |
2 | Suspended - cluster ingress and expired timers are not being processed |
3 | Snapshot - cluster is busy taking a snapshot |
4 | Quitting - cluster quitting as soon as services acknowledge, without snapshot |
5 | Terminating - node is terminating |
6 | Closed - terminal state |
Cluster Node Role¶
Value | Description |
---|---|
0 | Follower - Node is a follower in the current leadership term |
1 | Candidate to become Leader in the current leadership term |
2 | Leader - Node is the leader in the current leadership term |
This can be overwhelming in raw AeronStat output, especially once you have many clients connected to a multi-node cluster. It helps to make use of filters in AeronStat, for example by stream
.
Cluster Streams¶
Note
The defaults given below can be overridden.
The easiest way to understand which each stream is doing what is to trace the stream ID in the AeronStat output.
Stream Id | Description |
---|---|
10 | Aeron Archive Control Request Channel |
20 | Aeron Archive Control Response Channel |
100 | Log Stream |
101 | Cluster Ingress (clients sending to the Cluster's Consensus Module) |
102 | Cluster Egress (cluster leader sending to cluster clients) |
103 | Snapshot Replay Stream |
104 | Clustered Service Control Stream |
105 | Consensus Module Stream |
106 | Clustered Service Snapshot Stream |
107 | Consensus Module Snapshot Stream |
108 | Consensus Stream (between cluster nodes) |
Each stream can be monitored independently via AeronStat.
See also:
Sample¶
A sample rendering of the AeronStat output from one of the Aeron Cluster samples in the cookbook, showing how the streams in a simple cluster and cluster client interact:
Cluster¶
The sample below - which is nearly 100 lines in length - is for a single node cluster with a single client.
15:00:12 - Aeron Stat (CnC v0.2.0), pid 75364, heartbeat age 892ms
======================================================================
0: 2,112 - Bytes sent
1: 640 - Bytes received
2: 0 - Failed offers to ReceiverProxy
3: 0 - Failed offers to SenderProxy
4: 0 - Failed offers to DriverConductorProxy
5: 0 - NAKs sent
6: 0 - NAKs received
7: 51 - Status Messages sent
8: 1 - Status Messages received
9: 51 - Heartbeats sent
10: 1 - Heartbeats received
11: 0 - Retransmits sent
12: 0 - Flow control under runs
13: 0 - Flow control over runs
14: 0 - Invalid packets
15: 0 - Errors
16: 0 - Short sends
17: 0 - Failed attempts to free log buffers
18: 0 - Sender flow control limits, i.e. back-pressure events
19: 0 - Unblocked Publications
20: 0 - Unblocked Control Commands
21: 0 - Possible TTL Asymmetry
22: 0 - ControllableIdleStrategy status
23: 0 - Loss gap fills
24: 0 - Client liveness timeouts
25: 0 - Resolution changes: driverName= hostname=Hurricane.local
26: 1,824,439 - Conductor max cycle time doing its work (ns)
27: 0 - Conductor work cycle exceeded threshold count
28: 1,618,686,012,247 - client-heartbeat: 1
29: 1 - Archive Control Sessions
30: 1 - rcv-channel: aeron:udp?endpoint=localhost:9001|sparse=true 127.0.0.1:9001
31: 1 - rcv-local-sockaddr: 30 127.0.0.1:9001
32: 1,618,686,011,806 - client-heartbeat: 7
33: 0 - Cluster Errors - clusterId=0
34: 1 - Consensus Module state - clusterId=0
35: 17 - Cluster election state - clusterId=0
36: 2 - Cluster node role - clusterId=0
37: 3,552 - Cluster commit-pos: - clusterId=0
38: 1 - Cluster control toggle - clusterId=0
39: 0 - Cluster snapshot count - clusterId=0
40: 0 - Cluster timed out client count - clusterId=0
41: 1 - rcv-channel: aeron:udp?term-length=64k|endpoint=localhost:9003 127.0.0.1:9003
42: 1 - rcv-local-sockaddr: 41 127.0.0.1:9003
43: 288 - pub-pos (sampled): 18 -74825454 104 aeron:ipc?term-length=128k
44: 65,536 - pub-lmt: 18 -74825454 104 aeron:ipc?term-length=128k
45: 640 - pub-pos (sampled): 20 -74825453 10 aeron:ipc?sparse=true|term-length=64k|mtu=1408
46: 32,768 - pub-lmt: 20 -74825453 10 aeron:ipc?sparse=true|term-length=64k|mtu=1408
47: 640 - sub-pos: 6 -74825453 10 aeron:ipc?term-length=64k @0
48: 1,618,686,011,897 - client-heartbeat: 21
49: 0 - Cluster Container Errors - clusterId=0 serviceId=0
50: 1,056 - pub-pos (sampled): 24 -74825452 20 aeron:ipc?mtu=1408|term-length=65536|sparse=true
51: 32,768 - pub-lmt: 24 -74825452 20 aeron:ipc?mtu=1408|term-length=65536|sparse=true
52: 1,056 - sub-pos: 19 -74825452 20 aeron:ipc?sparse=true|term-length=64k|mtu=1408 @0
53: 800 - pub-pos (sampled): 25 -74825451 105 aeron:ipc?term-length=128k
54: 65,536 - pub-lmt: 25 -74825451 105 aeron:ipc?term-length=128k
55: 800 - sub-pos: 17 -74825451 105 aeron:ipc?term-length=128k @0
56: 288 - sub-pos: 26 -74825454 104 aeron:ipc?term-length=128k @0
58: 1 - snd-channel: aeron:udp?init-term-id=-1570082996|fc=min,t:10s|term-id=-1570082996|tags=34,33|term-offset=2624|control-mode=manual|alias=log|mtu=1408|term-length=67108864|ssc=true 0.0.0.0:58337
59: 1 - snd-local-sockaddr: 58 0.0.0.0:58337
60: 3,552 - pub-pos (sampled): 35 -74825450 100 aeron:udp?init-term-id=-1570082996|fc=min,t:10s|term-id=-1570082996|tags=34,33|term-offset=2624|control-mode=manual|alias=log|mtu=1408|term-length=67108864|ssc=true
61: 33,557,984 - pub-lmt: 35 -74825450 100 aeron:udp?init-term-id=-1570082996|fc=min,t:10s|term-id=-1570082996|tags=34,33|term-offset=2624|control-mode=manual|alias=log|mtu=1408|term-length=67108864|ssc=true
62: 3,552 - snd-pos: 35 -74825450 100 aeron:udp?init-term-id=-1570082996|fc=min,t:10s|term-id=-1570082996|tags=34,33|term-offset=2624|control-mode=manual|alias=log|mtu=1408|term-length=67108864|ssc=true
63: 3,552 - snd-lmt: 35 -74825450 100 aeron:udp?init-term-id=-1570082996|fc=min,t:10s|term-id=-1570082996|tags=34,33|term-offset=2624|control-mode=manual|alias=log|mtu=1408|term-length=67108864|ssc=true
64: 0 - snd-bpe: 35 -74825450 100 aeron:udp?init-term-id=-1570082996|fc=min,t:10s|term-id=-1570082996|tags=34,33|term-offset=2624|control-mode=manual|alias=log|mtu=1408|term-length=67108864|ssc=true
69: 3,552 - sub-pos: 43 -74825450 100 aeron-spy:aeron:udp?tags=34|session-id=-74825450|alias=log @2624
70: 3,552 - rec-pos: 0 -74825450 100 aeron:udp?tags=34|session-id=-74825450|alias=log
71: 3,552 - sub-pos: 46 -74825450 100 aeron-spy:aeron:udp?tags=34|session-id=-74825450|alias=log @2624
72: 1 - rcv-channel: aeron:udp?endpoint=localhost:9010 127.0.0.1:9010
73: 1 - rcv-local-sockaddr: 72 127.0.0.1:9010
--
Cluster Client¶
15:03:49 - Aeron Stat (CnC v0.2.0), pid 77779, heartbeat age 987ms
======================================================================
0: 4,000 - Bytes sent
1: 2,208 - Bytes received
2: 0 - Failed offers to ReceiverProxy
3: 0 - Failed offers to SenderProxy
4: 0 - Failed offers to DriverConductorProxy
5: 0 - NAKs sent
6: 0 - NAKs received
7: 27 - Status Messages sent
8: 30 - Status Messages received
9: 108 - Heartbeats sent
10: 51 - Heartbeats received
11: 0 - Retransmits sent
12: 0 - Flow control under runs
13: 0 - Flow control over runs
14: 0 - Invalid packets
15: 0 - Errors
16: 0 - Short sends
17: 0 - Failed attempts to free log buffers
18: 0 - Sender flow control limits, i.e. back-pressure events
19: 0 - Unblocked Publications
20: 0 - Unblocked Control Commands
21: 0 - Possible TTL Asymmetry
22: 0 - ControllableIdleStrategy status
23: 0 - Loss gap fills
24: 0 - Client liveness timeouts
25: 0 - Resolution changes: driverName= hostname=Hurricane.fios-router.home
26: 4,186,683 - Conductor max cycle time doing its work (ns)
27: 0 - Conductor work cycle exceeded threshold count
28: 1,609,475,953,178 - client-heartbeat: 1
29: 1 - snd-channel: aeron:udp?endpoint=localhost:9003 127.0.0.1:50835
30: 1 - snd-local-sockaddr: 29 127.0.0.1:50835
31: 3,889,568 - pub-pos (sampled): 3 -1365430415 101 aeron:udp?endpoint=localhost:9003
32: 12,294,560 - pub-lmt: 3 -1365430415 101 aeron:udp?endpoint=localhost:9003
33: 3,905,952 - snd-pos: 3 -1365430415 101 aeron:udp?endpoint=localhost:9003
34: 4,032,672 - snd-lmt: 3 -1365430415 101 aeron:udp?endpoint=localhost:9003
35: 0 - snd-bpe: 3 -1365430415 101 aeron:udp?endpoint=localhost:9003
36: 9,792 - sub-pos: 2 729594557 102 aeron:udp?endpoint=localhost:19000 @0
37: 9,792 - rcv-hwm: 5 729594557 102 aeron:udp?endpoint=localhost:19000
38: 9,792 - rcv-pos: 5 729594557 102 aeron:udp?endpoint=localhost:19000
--