About the durability service
The durability service is one of the services that can be configured when deploying a federation. At most, one durability service can be configured per federation. However, since a system typically consists of multiple federations, multiple durability services may be deployed in a single system.
The durability service has two main responsibilities:
- Implements the durable property (that is, ensures that late joiners receive historical data, and ensures that persistent data survives crashes).
If, for example, a late joining transient reader enters the system, then the durability service will make sure that it gets the data that has been published prior to the appearance of the reader. - Maintain state consistency in case of disconnects/reconnects.
For example, if two nodes are temporarily disconnected and one node has published data during then disconnection (which has been missed by the disconnected node), then the durability service makes sure that the node gets the data when reconnected (see merge policies).
Fault tolerance can be achieved by deploying multiple federations that maintain the same data set: if one fails, then the other can take over.
A durability service typically specifies what sets of data it maintains by means of so-called namespaces (see OpenSplice/DurabilityServices/Namespaces/NameSpace). For a durability service, a namespace is the unit of data that needs to be maintained: a durability service will either maintain all data for the specified namespace or not maintain the set at all if no matching namespace has been configured. The actual dataset can be configured in terms of partition/topic expressions.
Namespaces are required to be consistent within the system, i.e., namespaces may not overlap and when two or more nodes have the same namespace, then this namespace must be responsible for the same dataset. Durability services may need to exchange data sets in order to keep a consistent state. Exchanging a data set from one durability service to another is called alignment.
Durability performance tweaks
In case a durability service is configured on a node, the durability service typically becomes when the nodes starts up, or when topology changes occur. We will discuss the possibilities to tweak the performance for these performance tweaks for both these cases.
Performance impact at startup
The performance at boot time is for a large part determined by the InitialDiscoveryPeriod, the ExpiryTime, the amount data that needs to be retrieved, and the speed of the network channel that is used to retrieve the data. When persistence is configured required, also the performance of the disks may determine the overall performance at startup.
When a durability service starts, the first thing it does is look for other durability services (called fellows) that are present on the domain. The InitialDiscoveryPeriod specifies the time to look for other fellows. When this time has expired, a master durability service for each namespace will be chosen. The master durability service is the durability service that will need to gather all available data for a namespace and distribute it to all the other durability services that have a namespace for it. Essentially, there are two algorithms to determine a master:
- The legacy algorithm, which is configured when the policy for the name has NO masterPriority attribute or when the attribute is 255. When this algorithm is used there is a negotiation phase between the fellows to determine which one of them is selected as master (using a leader election protocol). In this protocol; first, a proposal will be done for a candidate master, and only if all durability services agree the master is confirmed. If no agreement is reached then another round of proposals is started. This algorithm will take at best the Network/ExpiryTime in case agreement is reached the first time, but depending on the number of rounds that are required this may be more.
- The new master selection algorithm (which is configured when masterPriority != 255 is used for the policy that applies to the namespace) will select a master immediately. This is good in situations where the master will not change once it is selected but may lead to additional alignments in situations where the master can change (e.g., because a “better” master with a higher priority arrives later and a change of master is needed).
When a master has been selected then the master has to acquire all the available data for the namespace from the fellows. The length of this process depends on the number of fellows, the size of the data set and the speed of the network channel that is used. Once the master has acquired all the data it will indicate that it has a new state, which triggers the fellow to acquire the data set from the master. Again, the time this will take depends on the size of the data set, the speed of the network channel and the number of fellows to provide the data to (later on we will see that this can be tweaked a bit by using the RequestCombine setting).
After startup, when there a no topology changes, the durability service should remain fairly silent (other than some health monitoring activities), and there should be little performance impact.
Performance impact due to topology changes
Topology changes may lead to losing a master and having to select a new master. When this happens, a new master may need to be elected (see the previous section regarding this process). once a master has been elected, the master may have to acquire all the available data for the namespace from the fellows, and the fellows may have to acquire the data from the master when its state has changed.
In general, the more frequent topology changes occur, the more master changes may occur and the more alignments may be required by the durability services.
Detailed tweaks
- RequestCombine period: Whenever a durability service receives a request, the node that is supposed to answer waits for the RequestCombine period to answer the request. This allows the node to combine requests for the same data from different fellows, and align them in one go. the larger it is, the longer it takes for the answer is being sent, but the higher the chance requests can be combined.
- EqualityCheck: When a durability service requests data from another durability service which happens to have the same set, then it makes no sense to send the set. When the equalityCheck feature is used, the requesting durability service will first calculate a hash over its set, and then sent the hash along with the set. The durability service that is supposed to answer the request will then also calculate a hash over its set, and only if the sets are equal then it will send back an indicator with the semantics “I have the same set as you”. In this way, the durability service will not have to align the complete data set.
Note: because this feature requires the calculation of a hash and will only save alignment over the wire when the data sets are equal, this feature will make sense in situations where the data set is potentially large and the risk of it being different (e.g., during a temporal disconnect) is low.