Performance
Replicated state machines serve as the foundation for fault tolerance and high availability in distributed systems, ensuring a consistent state across multiple replicas by processing client requests and replicating operations in a specific order. It's crucial to recognize that due to the deterministic nature of replicated state machines, they must operate using a single thread. This constraint guarantees that all replicas execute the same sequence of operations in the same order, thus maintaining consistency. However, single-threaded execution limits parallel processing of the queue, which can affect throughput and overall performance.
Little's Law (L = λW) provides an insightful framework for understanding and optimizing the performance of single-threaded replicated state machines. By examining the relationship between the average number of items in a queue (L), the average arrival rate (λ), and the average time an item spends in the queue (W), we can pinpoint performance bottlenecks and potential scalability challenges stemming from the single-threaded constraint.
In replicated state machines like Aeron Cluster, achieving optimal performance requires maintaining the average number of items in the processing queue (L) at one or less. By ensuring that the replicated state machine's inbound command queue does not accumulate and grow, the system can avoid increased latency for messages waiting in line behind those being processed. To attain high throughput (λ) with a replicated state machine like Aeron Cluster, it is crucial to minimize the average time (W) spent in the queue. It's important to note that beyond the context of the replicated state machine, employing buffers, queues, and batching can still provide significant performance benefits by keeping the time spent by the replicated state machine waiting for work to a minimum.
Using Little's Law, we can gain insights into the throughput capabilities of Aeron Cluster. For example, if the system targets a throughput of 1,000 transactions per second, it allows for a budget of roughly 1 millisecond per inbound command within the cluster. In contrast, when aiming for 50,000 transactions per second, the budget tightens to approximately 200 microseconds. This allocated time must encompass decoding inbound commands, encoding responses within the application logic, executing the actual business logic, and managing any pauses introduced by Java's Garbage Collection process.
Aeron Premium¶
In addition to the replicated state machine logic, many other factors can impact Aeron Cluster's performance:
- the availability of CPU time for processing commands in the replicated state machine (including any operating system costs such as context switching)
- the quantity of network hops taken, and the subsequent latency introduced
- queues within the operating system and hardware, including kernel network buffers, disk I/O capabilities etc
Aeron Premium includes kernel bypass functionality, which can — when coupled with zero-copy capable technologies such as Simple Binary Encoding — dramatically improve throughput for networking traffic.