Most of the problems experienced with Aeron are typically down to a resource problem - CPU starvation, slow disk I/O, memory paging etc. Linux offers many tools to assist in diagnosis of these problems - several of these tools are described in the Aeron Troubleshooting section.
Media Driver Timeout¶
The Media Driver Timeout sets the timeout for the Media Driver, and governs the setup timeouts along with client conductor to media driver timeout periods. Value is set on the Media Driver the
driverTimeoutMsValue on the context, or system property
aeron.driver.timeout. The value is specified in milliseconds.
|MediaDriver keep-alive age exceeded (ms): timeout=x, actual=y||This is fired when the maximum time period since the last expected keep-alive message was expected has been exceeded. Most frequently, this happens when system resources are too constrained, and/or the timeout is set to a value too small. It can also happen when the Media Driver has crashed. The |
|no driver heartbeat detected||This happens when the Client Conductor's start time plus Media Driver timeout passes the current time. Typically this is from a Media Driver that has crashed, or when the timeout value is set to to small a value.|
|no response from MediaDriver within (ns): x||This will typically happen when there is a media driver timeout, or Aeron is using an incorrect directory|
|CnC file is created but not initialized.||This can happen when the Cnc file has been created but not yet written to; typically give the media driver a bit more time / check if it has crashed.|
If your system is running an antivirus service, it is recommended to exclude any folders in which Aeron writes log buffers from scanning. If you don't, Aeron performance will be impacted - sometimes severely - with process freezes exceeding 10 seconds as the scan is run.
Client Liveness Timeout¶
This sets the timeout between service calls to the duty cycle for the client. Set via
clientLivenessTimeoutNs in the Media Driver Context. Must be set to a value less than or equal to the keep alive timestamp (
keepAliveIntervalNs). Value is specified in nanoseconds. Default is 10 seconds.
|service interval exceeded (ns): timeout=x, actual = y||This is fired when the time elapsed between one client conductor duty cycle to the next exceeds the value set by clientLivenessTimeoutNs. This exception represents a serious issue, and results in a conductor termination. The value given in |
Client Conductor Timeout¶
Client Conductor Timeouts mean that the link between the Client Conductor and Driver Conductor has timed out. This tends to happen when resources available to the process are constrained, and the client and/or driver conductors could not process their duty cycles within the alloted time frame. When this happens, there are essentially two options available: increase the timeout or ensure the process has sufficient resources.
Other common errors¶
Active Driver Detected¶
This happens when you restart an Aeron application very soon after it last exited. The Media Driver stamps an activity timestamp in the
cnc.dat file every second, and checks if it is less than 10 seconds old on start up. If it is, the Media Driver will refuse to start. It does this to prevent multiple Media Drivers from using the same folder, which would cause problems. To prevent the error, either wait for 10 seconds and retry, or alternatively set the Media Driver folder to delete on start via the context:
MediaDriver.Context ctx = new MediaDriver.Context().dirDeleteOnStart(true);
or via JVM argument:
Note that the 10 seconds timeout is default. If you are running with an extended driver timeout configured, or with a debugger attached and
aeron.debug.timeout set, this time will be longer.
Unable to allocate counter; buffer is full¶
IllegalStateException is raised when the counter buffer is full. Aeron's counter buffer can hold 8,192 entries, and something has used them all up - this is most likely a resource leak, such as a
Subscription that has not been closed.
You can view the counters via AeronStat, or via the
CountersReader object that is available on the
Aeron client via
aeron.countersReader(). This is especially useful in identifying resource leaks.
Message exceeds maxMessageLength¶
A message has been offered to a publication that exceeds the maximum length. See How to send messages over 8kb
Cannot assign requested address¶
This tends to happen when using the wrong address in a subscription. Publications send data to subscriptions, so publications need the address of the subscription, and subscriptions need their own address, not the publication.
- Publication is running on host 10.10.1.1 sending to stream 100
- Subscription is running on host 10.10.1.2 port 1234 listening on stream 100
In this case, both the publication and subscription should make use of 10.10.1.2 port 1234, stream 100.
See also the Aeron wiki on Channel Configuration