Performance Limits of Aeron Cluster¶
Aeron Cluster's performance is constrained by several factors, of which the two most important are network latency, and the average time taken to execute the hosted business logic. Other factors that impact performance include: Idle strategy configuration, Java garbage collection activity, CPU speed and availability, memory latency and availability, and disk I/O latency.
Business Logic Impacts¶
Aeron Cluster employs all the tricks it safely can (such as asynchronous messaging, natural batching for consensus and efficient internal messaging with Simple Binary Encoding) to be a high performance container for a replicated state machine, but it cannot safely break the rule that all items must be sequentially processed in a defined order. For this reason, it will most likely be the business logic contained within the replicated state machine that becomes the performance bottleneck of an Aeron Cluster based system.
Little's Law applies in this scenario. Little's Law states that "the long-term average number of customers in a stable system is equal to the long-term effective arrival rate multiplied by the average time a customer spends in the store".
In equation form, Little's Law is written as
L = λW. When we apply this to a replicated state machine, we can say that:
Lis the amount of work in progress commands at any time, i.e. the degree of concurrency in the system. For your business logic running within the replicated state machine held in Aeron Cluster, this is by design
λis the system's stable throughput (i.e. without queue growth)
Wis the time taken to process the command
Flipping the equation around to get the throughput, we have:
λ = L/W
L as 1, the only way to achieve a high throughput in Aeron Cluster is to invest in reducing the time taken for your business logic to process items (
W). You can use the calculator below to understand the throughput ceiling your business logic will impose:
For example, if your business logic takes 50μs (0.05ms) to process an average request, then you are capping the throughput of your business logic to 20,000 commands per second.
See Efficient Business Logic for information on how to best structure your business logic for performance.
When Aeron Cluster processes communicate with a Client process, the network latency can have a large impact on overall cluster latency. For example, a kernel offloaded three node cluster sharing a rack has a dramatically lower latency potential than a three node cluster with one node on AWS
us-west1, another on AWS
us-east1 and another on AWS
Aeron offers some of the lowest latency in the messaging industry, however, it cannot do any better than the base network latency.
To give an idea of the possible performance impact of the network latency, the number of round trip time (i.e. the time taken for a network to go from host A to B and back to A; also known as RTT) hops for client/cluster communication is given below. Measure your network RTT using something like Mellanox's
sockperf (available on GitHub, documentation is here) to get an idea of baseline performance.
|Client||Leader (on commit)||1.5 RTT|
|Client||Client (Leader committed, best case)||2 RTT|
|Client||Follower (on commit)||2 RTT|
If your Network RTT is 2ms, and your average business logic execution time is 1ms, then your best case client-to-client latency is likely to be approximately 5ms (assuming busy spin idle strategies, no queuing, no CPU starvation, no GC pauses and no I/O anomalies).