Skip to content

Aeron Tooling

Aeron Stat

Aeron Stat outputs the key counters from Aeron plus the positions and key counters of all the active and recently active streams.

To use Aeron Stat, you must provide the folder for the Media Driver that you want to inspect, for example if you configured the Media Driver context to be:

final MediaDriver.Context mediaDriverCtx = new MediaDriver.Context()
    .aeronDirectoryName("/dev/shm/md");

then the path provided to AeronStat would be as follows:

java -cp aeron-all-*.jar -Daeron.dir=/dev/shm/md io.aeron.samples.AeronStat

Output (looking at a running RPC server)

09:17:40 - Aeron Stat (CnC v0.2.0), pid 43085, heartbeat age 231ms
======================================================================
  0:              113,632 - Bytes sent
  1:               62,304 - Bytes received
  2:                    0 - Failed offers to ReceiverProxy
  3:                    0 - Failed offers to SenderProxy
  4:                    0 - Failed offers to DriverConductorProxy
  5:                    0 - NAKs sent
  6:                    0 - NAKs received
  7:                1,021 - Status Messages sent
  8:                1,651 - Status Messages received
  9:                3,543 - Heartbeats sent
 10:                1,933 - Heartbeats received
 11:                    0 - Retransmits sent
 12:                    0 - Flow control under runs
 13:                    0 - Flow control over runs
 14:                    0 - Invalid packets
 15:                    0 - Errors
 16:                    0 - Short sends
 17:                    0 - Failed attempts to free log buffers
 18:                    0 - Sender flow control limits, i.e. back-pressure events
 19:                    0 - Unblocked Publications
 20:                    0 - Unblocked Control Commands
 21:                    0 - Possible TTL Asymmetry
 22:                    0 - ControllableIdleStrategy status
 23:                    0 - Loss gap fills
 24:                    0 - Client liveness timeouts
 25:                    0 - Resolution changes: driverName= hostname=Hurricane.local
 26:            3,277,361 - Conductor max cycle time doing its work (ns)
 27:                    0 - Conductor work cycle exceeded threshold count
 28:                    1 - rcv-channel: aeron:udp?endpoint=127.0.0.1:2000 127.0.0.1:2000
 29:                    1 - rcv-local-sockaddr: 28 127.0.0.1:2000
 30:    1,618,665,460,506 - client-heartbeat: 1
 31:                  224 - rcv-pos: 5 -722424329 1 aeron:udp?endpoint=127.0.0.1:2000
 32:                  224 - sub-pos: 2 -722424329 1 aeron:udp?endpoint=127.0.0.1:2000 @0
 33:                  224 - rcv-hwm: 5 -722424329 1 aeron:udp?endpoint=127.0.0.1:2000
 34:                    1 - snd-channel: aeron:udp?endpoint=127.0.0.1:2001 127.0.0.1:63818
 35:                    1 - snd-local-sockaddr: 34 127.0.0.1:63818
 36:                  128 - pub-pos (sampled): 4 -1383902031 1 aeron:udp?endpoint=127.0.0.1:2001
 37:            8,388,736 - pub-lmt: 4 -1383902031 1 aeron:udp?endpoint=127.0.0.1:2001
 38:                  128 - snd-pos: 4 -1383902031 1 aeron:udp?endpoint=127.0.0.1:2001
 39:              131,200 - snd-lmt: 4 -1383902031 1 aeron:udp?endpoint=127.0.0.1:2001
 40:                    0 - snd-bpe: 4 -1383902031 1 aeron:udp?endpoint=127.0.0.1:2001
 41:                  128 - pub-pos (sampled): 6 -1383902030 1 aeron:udp?endpoint=127.0.0.1:2001
 42:            8,388,736 - pub-lmt: 6 -1383902030 1 aeron:udp?endpoint=127.0.0.1:2001
 43:                  128 - snd-pos: 6 -1383902030 1 aeron:udp?endpoint=127.0.0.1:2001
 44:              131,072 - snd-lmt: 6 -1383902030 1 aeron:udp?endpoint=127.0.0.1:2001
 45:                    0 - snd-bpe: 6 -1383902030 1 aeron:udp?endpoint=127.0.0.1:2001
--

Core Counters

Row Description
Top Line Most important piece of data here is the heartbeat age - this is the elapsed time since the last Media Driver heartbeat in the cnc.dat. If this is a large number (over 1000ms), check if your media driver is still running
0 Total bytes sent by the current media driver over UDP, excluding IP headers. If this isn't increasing at the pace expected by your application, something is wrong.
1 Total bytes received by the current media driver over UDP, excluding IP headers. If this isn't increasing at the pace expected by your application, something is wrong.
2 Failed offers to the Media Driver's Receiver Proxy; this suggests back pressure
3 Failed offers to the Media Driver's Sender Proxy; this suggests back pressure
4 Failed offers to the Media Driver's Conductor Proxy; this suggests back pressure
5 Total number of NAKs sent. This is how many times this Media Driver had to send a NAK in order to request a missing packet
6 Total number of NAKs received. This is how many times this Media Driver received a NAK in order to replay a missing packet to a remote Media Driver
7 Status Messages sent. This is a running count of the number of status messages sent by this Media Driver for flow control. This should increase over time.
8 Status Messages received. This is a running count of the number of status messages received by this Media Driver for flow control. This should increase over time.
9 Heartbeats sent. This is how many heartbeats have been sent by this Media Driver to indicate liveness to another Media Driver when there is no data to send. This should increase over time.
10 Heartbeats received. This is how many heartbeats have been received by this Media Driver to indicate liveness from another Media Driver when there is no data to send. This should increase over time.
11 Retransmits sent. This is how many packet retransmits have been sent by this Media Driver as a result of a NAK message. This will typically stay zero or very low in a healthy network (and with well behaved processes).
12 Flow control under runs. This is the count of packets which under-run the current flow control window for Images
13 Flow control over runs. This is the count of packets which over-run the current flow control window for Images
14 Count of invalid packets received by this Media Driver
15 Count of errors observed by this Media Driver. ErrorStat (see below) will provide details.
16 Short send count. A short send happens when the Media Driver's Sender agent expects to send a given buffer of data over the network, but the socket did not take all the data from the buffer. Typically, Aeron will recover from this. When this increases beyond a low number, it can be a complex problem to solve with causes ranging from incorrect buffer sizing to network equipment failure. The first place to look is typically the settings for the network buffer sizes: aeron.socket.so_rcvbuf and aeron.socket.so_sndbuf. aeron.rcv.initial.window.length must be less than or equal to aeron.socket.so_rcvbuf. Correct sizing can be an art, and can be especially challenging in a network with a large RTT variance. See also Bandwidth Delay Product. Note: you may need to update maximum socket buffer sizes in your operating system.
17 The number of times the Media Driver could not free a log buffer
18 Total number of back-pressure events over all streams. See also Back pressure
19 Count of times a publication has been unblocked after a client failed to commit() or abort() a tryClaim within timeout (see Publication TryClaim and Log Buffer Unblocking)
20 Count of times a command has been unblocked after a client failed to complete an offer within a timeout
21 The number of times a channel endpoint detected a possible TTL asymmetry between its config and a connection
23 This is the number of times a loss gap has been filled when NAKs have been disabled
24 The number of Aeron clients that have timed out without a graceful close (as in Aeron clients of this Media Driver)
25 The number of times the endpoints have been re-resolved (i.e. name resolution) resulting in a change
26 The maximum time taken in a conductor duty cycle in nanoseconds. Found in Aeron 1.33.0+
27 The number of times the time spent in a conductor duty cycle exceeded a configurable threshold (1s default). Found in Aeron 1.33.0+

Variable Counters

Row Description
28 in above example; varies Receive channel
29 in above example; varies Receive socket address
30 in above example; varies Epoch millisecond value of the last client heartbeat from the given client. The client in this context is the Aeron Client on the Media Driver.
35 in above example; varies Send channel

Lines 31 to 45 include position values. For more info on how to understand these, see Understanding Aeron Position. The lines with @, such as sub-pos on line 32 refer to the join position for the subscription - in this example, the subscription joined at position 0.

Note: There is a C version of the Aeron Stat tool. It's compiled and built with the C Media Driver. See C Media Driver.

AeronStat options

Arg Description
-h Shows the help text
watch=true or false If set to true, refreshes every n seconds. If set to false, runs once and exits. Defaults to true.
delay=seconds Specifies how often to refresh the output. Delay in seconds between update. Valid only if watch=true (or watch not specified)
stream={regex} Filters streams to only those that match the regex. Example: stream=101
type={regex} Filters output type (as in the counter type) to only those that match
session={regex} Filters sessions to only those that match
channel={regex} Filters channels to only those that match
identity={regex} Filters identity to only those that match

Error Stat

Error Stat prints all of the errors raised within the Aeron process. As with AeronStat, you must point ErrorStat at a Media Driver folder.

java -cp aeron-all-*.jar -Daeron.dir=/dev/shm/md io.aeron.samples.ErrorStat

When all is running as expected, error stat will produce output such as:

0 distinct errors observed.

Note: There is a C version of the Error Stat tool. It's compiled and built with the C Media Driver. See C Media Driver.

Stream Stat

Stream Stat is found within the Aeron samples folder, and can be launched from the aeron-all jar as below. As with AeronStat, you must point StreamStat at a Media Driver folder.

java -cp aeron-all-*.jar -Daeron.dir=/dev/shm/md io.aeron.samples.StreamStat

Stream stat provides a view on each stream within the media driver, including the publisher and sender view. The view is much like aeron stat, except the view is flattend. To allow it to be shown on a page, the single line 2 is broken into lines 2-10 below.

Command `n Control file /dev/shm/md/cnc.dat
sessionId=-1245628686 streamId=10 
    channel=aeron:udp?endpoint=localhost:40123 : 
    pub-pos (sampled):3:320 
    pub-lmt:3:8388992 
    snd-pos:3:384 
    snd-lmt:3:131456 
    sub-pos:1:384 
    rcv-hwm:4:384 
    rcv-pos:4:384

Backlog Stat

Backlog Stat is a tool to highlight backlogs in data flows. It functions on both IPC and UDP channels. As with AeronStat, you must point BacklogStat at a Media Driver folder.

java -cp aeron-all-*.jar -Daeron.dir=/dev/shm/md io.aeron.samples.BacklogStat

Sample output:

sessionId=1155221173 streamId=8 channel=aeron:udp?endpoint=10.1.1.1:4000 :
┌─for publisher 77 the last sampled position is 187392 (~0 bytes before back-pressure)
└─sender 77 has to send 0 bytes (2031779 butes remaining in the sender window)

sessionId=-614368527 streamId=9 channel=aeron:udp?endpoint=10.1.1.1:4001 :
┌─for publisher 6333 the last sampled position is 12739208 (~0 bytes before back-pressure)
└─sender 6333 has to send 65373 bytes (2031779 butes remaining in the sender window)

This tool can highlight problems with backlogs of data in given channels. In the example run above, the top session has no backlog, and the lower session has an outstanding backlog of 65373 bytes. Use this information to investigate network, process and/or design problems.

Loss Stat

LossStat logs all the data loss events suffered by Aeron. Note that IPC data is not subject to data loss, and will not appear in LossStat. As with AeronStat, you must point LossStat at a Media Driver folder.

java -cp aeron-all-*.jar -Daeron.dir=/dev/shm/md io.aeron.samples.LossStat

An example run:

#OBSERVATION_COUNT,TOTAL_BYTES_LOST,FIRST_OBSERVATION,LAST_OBSERVATION,SESSION_ID,STREAM_ID,CHANNEL,SOURCE
688,4167028,2020-08-16 13:53:39.053+0000,2020-08-16 13:53:41.003+0000,1155221173,8,aeron:udp?endpoint=10.1.1.1:4000;10.1.1.2:60950

This tells us the following about the session 1155221173 on channel aeron:udp?endpoint=10.1.1.1:4000 on stream 8 ⤌⤍ 10.1.1.2:60950 flow:

  • there were 688 data loss events
  • a total of 4,167,028 bytes were impacted
  • the loss first happened at 2020-08-16 16:53:39.053+0000
  • the last loss happened at 2020-08-16 16:53:41.003+0000

Armed with this information, you're able to investigate any network or host problems around those times. Note that little amounts of loss are fairly common.

Note: There is a C version of the Loss Stat tool. It's compiled and built with the C Media Driver. See C Media Driver.

Log Inspector

Log Inspector is found within the Aeron samples folder, and can be launched from the aeron-all jar as below. You must point the Log Inspector at a LogBuffer file directly.

java -cp aeron-all-*.jar io.aeron.samples.LogInspector <logbuffer file>

Log Inspector allows us to inspect Log Buffer files, including:

  • if the log buffer is connected
  • how many terms the log buffer has been through (see Log Buffers & Images)
  • the state of the 3 terms in the log buffer
  • and the data within a term, dumped as hex. This includes details on which session and stream produced the data.
======================================================================
Thu Dec 31 09:46:19 EST 2020 Inspection dump for 3.logbuffer
======================================================================
   Is Connected: true
Initial term id: -1822262504
     Term Count: 20
   Active index: 2
    Term length: 67108864
     MTU length: 1408
      Page Size: 4096
   EOS Position: 9223372036854775807

default DATA Header{frame-length=0 version=0 flags=11000000 type=1 term-offset=0 session-id=301746870 stream-id=10 term-id=-1822262504 reserved-value=0}

Index 0 Term Meta Data termOffset=67108928 termId=-1822262486 rawTail=-7826557782030548928 position=1275068416
Index 1 Term Meta Data termOffset=67108928 termId=-1822262485 rawTail=-7826557777735581632 position=1342177280
Index 2 Term Meta Data termOffset=1822720 termId=-1822262484 rawTail=-7826557773505900544 position=1344000000

======================================================================
Index 0 Term Data

0: DATA Header{frame-length=0 version=0 flags=00000000 type=0 term-offset=0 session-id=0 stream-id=0 term-id=0 reserved-value=0}
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

======================================================================
Index 1 Term Data

0: DATA Header{frame-length=0 version=0 flags=00000000 type=0 term-offset=0 session-id=0 stream-id=0 term-id=0 reserved-value=0}
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

======================================================================
Index 2 Term Data

0: DATA Header{frame-length=36 version=0 flags=11000000 type=1 term-offset=0 session-id=301746870 stream-id=10 term-id=-1822262484 reserved-value=0}
02004001
64: DATA Header{frame-length=36 version=0 flags=11000000 type=1 term-offset=64 session-id=301746870 stream-id=10 term-id=-1822262484 reserved-value=0}
03004001
...
Back to top