Questions on how Shared Memory is Allocated

When a topic is written by a writer on a particular node, what is the data path to shared memory?

Writer -> Writer Cache -> Publisher ->Publisher Cache-> Shared Memory

A sample is written in some programming language (let’s say C++) and passed to its application Writer by invoking its write() call.
The Writer allocates shared memory and translates this C++ sample into its shared memory internal database representation.
The Writer delivers a (refCounted) pointer to this shared memory representation to all matching Readers that have subscribed to its topic.
- Please note that some of the services also act as Readers to these samples, for example the ddsi2e service will subscribe itself to all topics published in the local shared memory.
- Likewise the durability service will act as a subscriber to all TRANSIENT/PERSISTENT topics.
If one of the Readers runs into a ResourceLimit, it rejects the sample. This will cause the Writer to cache the sample in its own history cache, and attempt retransmission at a later time.
- Be aware that if the Writer uses KEEP_LAST semantics, an older sample may be replaced with a newer sample for the same instance without ever having been transmitted in the first place.

How are other services/applications alerted on shared memory changes?

It is the Spliced Daemon that monitors shared memory and alerts other processes such as the networking service and other local processes.

The Spliced Daemon only processes changes that occur on the builtin topics, since they may represent changes in the connectivity and impact the liveliness for applications.

For example, if a node disconnects the Spliced Daemon will figure out which publishers were running on that node and will generate the appropriate LivelinessChanged status event and modify the affected InstanceStates on the Reader side from ALIVE to either NOT_ALIVE_NO_WRITERS (in case auto_dispose_unregistered_instances is set to FALSE) or to NOT_ALIVE_DISPOSED (in case auto_dispose_unregistered_instances is set to TRUE).

Every application handles the notifications for its own Readers. Basically every application DomainParticipant spawns its own ListenerThread (whose priority you can control with the listener_scheduling attribute of your DomainParticipantQoS) that performs the listener notifications on behalf of all entities created by that participant. This Listener thread will normally be blocked on a condition variable in the shared memory, and the thread that produces the event (ddsi2e in case of data arrival over the network, spliced service in the case of detected changes in connectivity, local publishing applications in case locally published data) notifies the condition variable so that the Participant’s listener thread will become active and invoke the registered Listener operations.

Are there data structures which if too small can cause data to not reach all of its destinations?

For example, between a writer calling write() and a reader calling take(), what data structures may need to be sized properly?

By default all local application Readers have unlimited resources and a KEEP_LAST policy with a depth of 1. You can modify your application ReaderQoS to KEEP_ALL, and set restrictions in its available ResourceLimits. This would imply that the affected Readers would need to take() their data (as opposed to read their data) to make room for newly incoming samples. If the Readers are not able to keep up with the Writer and reach their configured ResourceLimits, the impacted Readers will reject the incoming samples. Depending on the selected QoS of the affected Readers and Writers, the sample will either be dropped (for example in case of BEST_EFFORT traffic), or the Writer will cache the sample in Its own history and attempt to retransmit at a later moment in time (in case of RELIABLE traffic). Depending on how much history the Writer has available itself (by default the Writer will use a KEEP_LAST policy with a depth of 1, but you may change this to a KEEP_ALL policy with restricted resource limits) it may end up running out of its own resources in which case it will block the writing thread until either resources become available or the max_blocking_time set in your ReliabilityQoSPolicy expires, whatever happens first. In the latter case the write() call will return OK, in the latter case it will return a TIMEOUT.

Please note if using the DDSI2E service, that every channel you configured in your ddsi2e service behaves as a KEEP_ALL reader which behaves as a FIFO queue whose default depth you can determine with the //OpenSplice/DDSI2EService/Sizing/NetworkQueueSize parameter in your config file of override for individual channels with the /OpenSplice/DDSI2EService/Channels/Channel/QueueSize Parameter in your config file.

When an application Reader rejects incoming samples, it can be notified of this event by setting a handler on the on_sample_rejected_status() callback of the ReaderListener. The Writer will not be notified of this, but will try to do the retransmissions for RELIABLE data in a separate background thread that is executed 50 times per second. Only when the Writer cache (whose size is controlled by your WriterQoS) runs out of resources will it block the calling thread, and only when that takes longer than the configured max_blocking_time will your writing application become aware of this because the write() call returns a TIMEOUT.

Where is this logged if this happens?

When a ddsi2e Reader rejects a sample because it runs out of the QueueSize you configured for it, it uses the same mechanism as for rejecting application Readers. A network Queue running out of resources in not an uncommon phenomenon and will be solved by the Writer transparently. This event is therefore not logged as such. However, if you want to know how often the Writer runs into this situation, you can enable the statistics for application Writers in your config file (see //OpenSplice/Domain/Statistics) and watch the relevant statistics for that Writer in the Tuner (see Statistics tab, which displays things like numerOfRetries, numberOfWritesBlockedBySamplesLimit, etc.)

Does the DDSI2 service get internal data from shared memory?

Yes, all data destinations for the network on a shared memory configuration will extract all data from the shared memory segment. Further, all received data will be placed within the shared memory segment.

Data flow out is thus: Writer->Publisher->SharedMem->NetworkingService->NetworkStack
Data flow in is thus: NetworkStack->NetworkingService->SharedMem->Subscriber->Reader

As stated above, the ddsi2 service behaves like an application Reader and gets refCounted pointers to the samples in the shared memory in its FIFO queue. If ddsi2 determines that there is remote interest for these samples, it will serialize the sample and transmit it over the network. If not, it will just drop its refCount and move on to the next sample.

How does the network queue play into this mechanism?

As stated above in , the ddsi2 service acts as an application Reader that can take one subscription to all topics, and queue them in FIFO order. The size of this queue is configurable in your config file.

What messages might show up in the trace logs for each of the DDSI2E service threads (recv, tev, xmit, etc.) based on where data shows up in the data path?

The best way to understand the inner working of the ddsi2 is to read section 8 of the Deployment Manual (you can also find this in the docs/pdf directory of your OpenSplice installation Directory). The exact semantics of the log files is not documented, and may be changed across releases. Analyzing the very low level details of the log files is not something we would recommend our end users, because it can be easy to become overwhelmed by all the detail. However, in the bin directory of the source code distribution you can find a Perl script called decode-ddsi-log that extracts all the relevant high level details that you might need from your log files.

The recv thread seems to log “DATA” no matter if the data came from another node or the same, does this mean the xmit thread passes data to recv thread?

By design sent data will be sent to all listening threads, even to itself. You could prevent these in multiple ways but one is to use separate partitions for each data or use DDSI2 Channels.

If local loopback is enabled (which it is by default), you will end up receiving the data you just sent out in your own socket receive buffer as well. The networking receive thread will then decode the message only to determine it is not needed here and will discard it. Probably it logged it just prior to discarding it.

Which listener methods on the Readers might be helpful to determine data that can be read by the DDSI2 service but not readers?

All data that is published locally can be made available to both local Readers as to the ddsi2 service. Of course, the ddsi2 service will consume all locally generated data (for there might be a need for it on a remote node) whereas local Readers will only get the locally published data that they actually subscribe to.

How is certain data received by the ddsi2 service but does not end up in the application reader cache?

There can be many reasons for this, but the first I would check for is whether the application reader actually connects to the application Writer. Reasons why they might refuse to connect is because of QoS conflicts where the Reader is connected to the wrong partition, or where some RxO policies don’t match. (In that case the quality that the Writer offers is not considered “good enough” for the Reader’s own standards). You can check if this is indeed the case by looking at the RequestedIncompatibleQosStatus of the Reader. Also, try to see if the originating Writer shows up in the SubscriptionMatchedStatus event).

The application is dropping data. Why might that be?

We currently use take(LENGTH_UNLIMITED, NOT_READ_SAMPLE_STATE, ANY_VIEW_STATE, ALIVE_INSTANCE_STATE) as our take parameters.

The way you are reading data here might indeed cause some samples to be dropped. Since you only take ALIVE data, consider what happens when a sample from a LIVE writer arrives, but before the Reader gets the chance to take() it, the Writer disposes its instance. Now the instance_state for that sample will be set to NOT_ALIVE_DISPOSED (the dispose event gets piggy-bagged onto the sample that is already there) which causes your take() call to skip it altogether. Try taking the samples with the ANY_INSTANCE_STATE mask and check if you still have missing samples.

How does the Durability policy work?

In summary Durability acts as if it is a subscriber to all TRANSIENT/PERSISTENT data. It normally is configured using a KEEP_LAST policy, so it will never block incoming data because it can just replace old samples with newer samples for the same instance. However, if you configure Durability to use KEEP_ALL semantics, you have to make sure you allocate enough resources to hold the entire history of your topic, since the durability service does not consume samples like an application reader that uses take() semantics.

Read the Durability section of the Deployment Manual for details on this.

Can you filter a read call to ignore a message from the same machine?

There are several ways to filter out locally generated data from your local subscribers. Probably the easiest one is to use the ignore_participant()/ignore_publication() operations on your DomainParticipant. With them you can instruct all your subscribers in your participant to ignore data generated from a particular publisher or from all publishers in the indicated participant. The instance_handle to identify the participant/publisher in question can be obtained from the get_instance_handle() operation on the participant/publisher in question.