Databases & Aeron Cluster¶

Databases can be a powerful complement to Aeron Cluster, but they are optional. The questions each Aeron Cluster implementation needs to ask are:

is the cluster the system of record?
how will the cluster be bootstrapped?
what reliability options do you want to apply for critical cluster events?

System of Record¶

Generally speaking, if the cluster process is not the system of record, the database is more likely to be optional. The necessary events can be sent to the system of record via a Gateway process.

Notes:

always consider the legal implications of your particular usage. If you have regulatory or other reasons to have a database, use a database. Don't skip the database because it's optional in theory.
snapshotting state - and introspecting it - can be a complex task, especially given the changes to structure over a project's lifetime. Consider how you will snapshot with versioning very carefully.

Bootstrapping the Cluster¶

Bootstrapping - the phase of cluster startup in which all the data required to run is loaded - is typically done in two ways:

via commands and snapshots. In this scenario, an admin cluster client is granted permission to write reference and other operational data to the cluster. Once the necessary data loaded, the cluster is bootstrapped exclusively off of snapshots and modifications to the reference and operational data are sent in via the admin client.
via a database, optionally with snapshots or not. In this scenario, a protocol exists between some external process and the cluster, and the cluster is bootstrapped on start. This external process reads data from a configuration/reference data database. This process can submit the necessary commands to the cluster as data changes intra-day. On cluster start, you can optionally start from snapshot, or clear down any snapshots - this will depend on the SLAs and frequency of your cluster's restart schedule.

Cluster Events & Reliability¶

If you're writing critical cluster events to a database, you need to consider the level of reliability you require, and how you want to handle recovery scenarios.

Consider this case:

Cluster, database gateway and database are all operating normal
Trades 1, 2 and 3 are booked and are written to the database by the database gateway
Database server fails
Trades 4, 5 and 6 are booked.

What now? You have at least two options:

Introduce retry mechanisms within the cluster. If you put the retry within the database gateway, what happens if it fails?
Append critical events to a dedicated Aeron Archive running within the Cluster. Use the Archive offset as a high watermark in the database. If the database fails, restart the database gateway. On start of the database gateway, read the last written high watermark from the database and process the cluster's Archive from that point.

The advantage of the second approach is that Aeron Archive is managing the protocol on your behalf - there are no commands that need to flow via the cluster log to replay or recover, and your business logic is not polluted with retry mechanisms.