This page lists all the fixed bugs and changes in the OpenSplice v6.7 releases.
Regular releases of OpenSplice contain fixed bugs, changes to supported platforms and new features are made available on a regular basis.
There are two types of release, major releases and minor releases. Upgrading OpenSplice contains more information about the differences between these releases and the impact of upgrading. We advise customers to move to the most recent release in order to take advantage of these changes. This page details all the fixed bugs and changes between different OpenSplice releases. There is also a page which details the new features that are the different OpenSplice releases.
There are two different types of changes. Bug fixes and changes that do not affect the API and bug fixes and changes that may affect the API. These are documented in separate tables.
Fixed Bugs and Changes in OpenSplice v6.7.x
OpenSplice v6.7.2
Fixed Bugs not affecting the API in OpenSplice 6.7.2
Report Id | Description |
---|---|
OSPL-9085 / 16827 | Race condition between multiple simultaneous started OpenSplice daemons with the same domain configuration. Starting multiple OpenSplice daemons in shared memory mode for the same domain configuration at the same time could lead to a crash. Solution: The race condition between the multiple OpenSplice daemons is fixed and a consecutive started daemon will be detected correctly and exit properly. |
OSPL-9672 | Java5 Helloworld example creates topic with wrong QoS. The Java5 Helloworld examples creates a topic with the default QoS. This differs from all other language binding HelloWorld examples. As a result of this an error message will be reported when this example wants to communicate with other language binding HellowWorld examples. Solution: The defect is fixed and the Java5 example now creates a topic with the correct QoS so it can also communicate with the HellowWorld examples from other languages. |
OSPL-9684 | A failing networking service does not always restart with FailureAction configuration restart. When a networking service detects a failure, it'll terminate as gracefully as possible. The splice daemon was not able to detect whether the networking service terminated due to a valid stop or due to a detected failure. This means that the splice daemon did not restart the networking service when it failed gracefully. Solution: Set the networking service state to died when it failed gracefully. |
OSPL-9738 | When the durability service terminates a mutex is not cleaned up properly and potentially causes a memory leak. The durability services uses various mutexes to protect access to shared resources by different threads. One such mutex is used to protect updates to a sequence number. This mutex is not cleaned up properly. Solution: The mutex is now cleaned up properly. |
OSPL-9797 / 17608 | DataWriter deletion is not always reflected in the instance_states of the DataReader. When two nodes have unaligned clocks and are using BY_RECEPTION_TIMESTAMP destination ordering, and the clock of the sending node runs ahead of the clock on the receiving node, then the deletion of the DataWriter on the sending node may not always correctly be reflected in the instance_states of the DataReader. These instance_states should go to either NOT_ALIVE_DISPOSED or NOT_ALIVE_NO_WRITERS, depending on the auto_dispose_unregistered_instances setting of the DataWriter. However, when the clock skew is bigger than the lifespan of an instance, its instance_state might remain ALIVE after deletion of the DataWriter. Solution: The algorithm now always applies the correct instance_state in the above mentioned case. |
OSPL-6636 / 17203 | In the Isocpp2 API a deadlock may occur when a listener is removed when a listener callback is in progress. When using the Isocpp2 API a deadlock may occur when removing a listener. Before a listener callback is called a mutex lock is taken to prevent that the listener can be destoyed when the listener callback is active. When in the listener callback an Isocpp2 operation on the corresponding entity is called the associated entity mutex lock may be taken. When at the same time the entity is being closed from another thread a deadlock may occur because both involved mutex locks are performed in different order. Solution: For all listener operations a separate mutex is used to prevent the listener to be removed when a listener callback is active. This lock is used outside of the normal entity lock which ensures that no deadlock can occur. |
OSPL-9675 / 17534 | Unnecessary error trace in the classic C++ API. An error will be traced when getting the listener of a Subscriber, Publisher, DataWriter or DataReader when no listener was set. This isn't really an error and the traces are unnecessary and even unwanted. This error is also traced when calling Subscriber::notify_datareaders(). Solution: The error traces are removed. |
OSPL-9730 / 17556 | Out of sync dispose_all_data. The dispose_all_data is an asynchronous C&M operation that is not part of a coherent update. It is advised to use BY_SOURCE_TIMESTAMP as destination_order when using dispose_all_data to negate the asynchronous behaviour of it. However, the dispose_all_data still uses the BY_RECEPTION_TIMESTAMP of the built-in C&M operation. This introduces a small timing issue when another node executes the dispose_all_data and immediately writes a new sample. If the write overtakes the dispose_all_data, it is possible that the dispose_all_data will dispose of that sample, while in fact the sample is newer than the dispose and should be retained. Solution: The dispose_all_data handling now uses the source timestamp in stead of the reception timestamp regardless of the built-in topic destination_order. |
OSPL-9823 | C# API reports "Detected invalid handle" in error log during onDataAvailable callback. When the C# API performs an onDataAvailable callback on a DataReaderListener, it also logs a message in the error log that states that an invalid handle was detected. The error reported is harmless and totally irrelevant, and your application will continue to function properly. Solution: An onDataAvailable callback will no longer cause this error to show up in the log. |
Fixed bugs and changes affecting the API in OpenSplice 6.7.2
Report ID. | Description |
---|---|
OSPL-9654 / 17532 | Missing function prototypes on C APIs. When using the missing-prototypes compiler warning flag, the generated code for SAC, C99 and FACE will trigger that compile warning. Solution: Add prototypes just before the related generated function definitions. |
OpenSplice v6.7.1p3
Fixed bugs not affecting the API in OpenSplice 6.7.1p3
Report ID. | Description |
---|---|
OSPL-9430 / 17158 | The isocpp2 API does not report an error when the generated discriminant setter related to a union is used incorrectly. For a union type the idl preprocessor generates an setter for the discriminant of the union. This is the _d(val) function. However, this setter may only be used to set the discriminant to a label that corresponds to the current state of the union. A union case may have more than one label associated with an certain case. With the use of the _d(val) function it is only allowed to set the discriminant to one of the alternative labels associated with the current selected case. However when this setter is used incorrectly the isocpp2 API does not raise an exception and it may cause a crash of the application. Solution: The generated discriminant setter function _d(val) checks if the specified value corresponds with the current state of the union and raise an exception otherwise. |
OSPL-9444 | Move iShapes example from isocpp to isocpp2. The iShapes example should work on top of the isocpp2 i.s.o. the deprecated isocpp API. Solution: The example has been ported to the isocpp2 API |
OpenSplice v6.7.1p2
Fixed bugs and changes not affecting the API in OpenSplice 6.7.1p2
Report ID. | Description |
---|---|
OSPL-8405 / 16519 | The discovery channel of the networking service may miss disconnects when it is configured for a very small detection period. When the discovery channel of the networking service is configured to detect the death (disconnect) of an node within a few 100 msec, then it may occur that the disconnect is not detected. In that case the discovery channel may not detect the disconnect because of an incorrect scheduling of the time the evaluation of the discovery heartbeats occurs. This may cause that a reliable channel becomes stalled when a reconnect of a node occurs which is not detected. Solution: The scheduling of the heartbeat evaluation is moved to the receive thread of the discovery channel to make it independent of the scheduling of the main networking thread. Furthermore, the heartbeat evaluation time is made more strictly related to the maximum interval in which a heartbeat should have been received from a node. |
OSPL-9027 | idlpp is not robust to paths with whitespace. The idlpp tool is not able to handle paths that contain whitespaces when compiling for cpp and thus using cppgen. Solution: Add quotes to the arguments for cppgen, which are generated by idlpp. |
OSPL-9030 | OSPL_URI not set correctly in console run. The OSPL_URI can be set using the Launcher Settings dialog. The user can select a file using the browse dialog by clicking the "..." button. The file path must be prepended with "file://". When using the browse dialog, the "file://" was not being included. Solution: When using the OSPL_URI browse dialog, the "file://" is now prepended to selected file path. |
OSPL-9363 / 17024 OSPL-9075 / 16821 | Possible spliced deadlock when other services crash. When another service or application crashes and leaves the kernel in an undefined state, it can happen that the splice daemon will deadlock and will never shutdown. Solution: Added a thread watchdog to spliced and abort when the shutdown-thread deadlocks. |
OSPL-9389 | Potential crash when removing certain entities concurrently in classic java PSMs When a datawriter or datareader is removed by one thread, while a different thread is removing the corresponding subscriber or publisher, in Classic Java PSMs (SAJ and CJ), the application can crash. Solution: The issue was resolved by changing the locking strategy so that a publisher/subscriber cannot be removed when one of its datawriters or datareaders is in use by a different thread. |
OSPL-9491 | Terminating Streams example causes invalid memory free When the Streams subscriber or publisher example is terminated before it completes, i.e. by pressing Ctrl-C, it can trigger an invalid free of the partition name. Solution: The issue is resolved by using a copy of the partition string that can be freed under all circumstances. |
OSPL-9506 | RnR service crash when replaying DCPSTopic The RnR service can crash when replaying DCPSTopics. This is due to an improper free. Solution: The improper free in the RnR service is fixed. |
OSPL-9540 / 17190 | The networking service exits with a fatal error when a best-effort channel runs out of buffer space. When the situation occurs that all the defragmentation buffers for a best-effort channel are in use and a new buffer is needed then the networking service will first reclaim fragment buffers from messages which are not yet complete (fragments missing) or from the list of buffers waiting to be defragmented. When it fails to free a buffer then the networking service will terminate with a fatal error indicating that it has run out of defragmentation buffers. Solution: When the situation occurs that a best-effort channel runs out of defragmentation buffers and is not able to reclaim a buffer then it will stop reading from the socket until buffers become available again. Note that this may cause that fragments will be lost when the maximum receive buffer size of the socket is exceeded. For a best-effort channel this is allowed. |
OSPL-9551 | Some buttons on Tuner's Writer Pane are connected to wrong Writer functions Some Buttons on the Tuner's Writer Pane are connected to the wrong Writer functions: The Register button is connected to the Dispose function (Should be register). The Unregister button is connected to the Dispose function. (Should be unregister) Solution: The handlers for the various buttons have been corrected. |
OSPL-9552 / 17191 | The networking service should log the selected network interface. The networking service does not report the network interface that is has selected. To support the analyses of problems on hosts which have more than one network interface configured it the networking service should report the selected network interface. Solution: On startup of the networking service the selected network interface is reported in the info log. The name of the interface is also included in the report which indicates that networking has detected a state change of the network interface (down/up). |
OSPL-9608 | Samples can be purged prematurely Purging samples depends (among other things) on the service_cleanup_delay. If opensplice is started when the up time of the node is smaller than the service_cleanup_delay, then disposed samples will be purged permaturely. Solution: Improve check between node up time and the service_cleanup_delay. |
TSTTOOL-437 / 17181 | OpenSplice Tester crash When running tester scripts in headless mode, ArrayIndexOutOfBoundsExceptions are sometimes thrown and logged to the console. The exceptions were the result of a race condition on startup. Solution: A fix was made in the MainWindow to prevent the race condition when populating filtered topics. |
OpenSplice v6.7.1 p1
Fixed bugs and changes not affecting the API in OpenSplice 6.7.1p1
Report ID. | Description |
---|---|
OSPL-9047 / 16793 | Deadline-missed events are not necessarily triggered per instance When multiple deadlines are missed around the same time and a listener is used to monitor these events, only a single listener callback may be performed. Solution: A separate listener callback is performed for each missed deadline. |
RNR-704 / 17155 | RnR CDR recording cannot be imported RnR is not properly processing the situation where there is no active union case. Solution: RnR now accepts the option of not having an active union case. |
OpenSplice v6.7.1
Fixed bugs and changes not affecting the API in OpenSplice 6.7.1
Report ID. | Description |
---|---|
OSPL-7942 OSPL-9207 / 16854 | Thread specific memory leaking away after thread end Threads that are created from a user application don't use the ospl internal thread wrapper. Thread specific memory allocated by the ospl software stack isn't freed when these threads exit. Solution: Use OS supplied callback functionality to call a destructor function when a user created thread exits to free allocated memory. |
OSPL-8425 / 16551 | When a lot of fragmented, best effort data is received, the receiver will run out of buffer space Defragmentation buffers is shared with the receive buffer. In case a lot of data is received, the networkstack isn't able to free claimed buffer space as it doesn't get to time to defragment data in the buffers. As a result, incoming data can not be stored and networking can't recover from this situation as these buffers remain locked. Solution: In case no buffers can be freed, drop data in buffers to create free space to be able to receive new data and continue to function. Data in the dropped buffers is lost but as this is best-effort data, this is allowed. |
OSPL-8813 | Calling RMI C++ CRuntime::stop() can cause deadlocks When RMI C++ CRuntime::stop() is called, it can deadlock due to timing. Two threads will wait until the other is stopped. Solution: By caching certain information, it's not necessary for one thread to wait for the other while stopping. |
OSPL-8958 | DDSI can regurgitate old T-L samples for instances that have already been unregistered DDSI maintains a writer history cache for providing historical data for transient-local writers and for providing reliability. An instance is removed from this cache when it is unregistered by the writer, but its samples are retained until they have been acknowledged by all (reliable) readers. Already acknolwedged samples that were retained because they were historical data could survive even when the instance was removed. When this happened, a late-joining reader would see some old samples reappear. Solution: deleting an instance now also removes the already acknowledged samples from the history. |
OSPL-9058 / 16796 OSPL-9206 / 16852 | Incompatibility with versions before V6.5.0p5 An internal change to builtin heartbeat topic caused an incompatibility with older versions. When adding a node running a recent version of OpenSplice to a domain with nodes running a version before V6.5.0p5, the existing nodes would incorrectly dispose participants (and corresponding entities) belonging to the new nodes after a single heartbeat period, normally done only when a heartbeat expires. Solution: To resolve this, the change to the heartbeat topic was reverted. |
OSPL-9059 / 16803 | Custom Lib missing for ISOCPP2 on DDSCE-P708-V6x-MV-A Custom LIb missing for ISOCPP2 on DDSCE-P708-V6x-MV-A target platform. Solution: Custom lib got added for this platform too. |
OSPL-9113 | When the persistent store contains an unfinished transaction a non-coherent reader may not receive the corresponding historical data. When the durability service injects persistent data and the persistent data set contains unfinished transactions and there is a non-coherent reader present then this reader will not receive the data of these unfinished transactions. For persistent data the durability service unregisters each instance after injecting. However, the injection the historical samples of a transaction expect that the instance is still registered. Solution: When retrieving historical data and the historical data contains unfinished transaction then the corresponding samples are injected into the non-coherent reader independent from the existence of a registration for the corresponding instance. |
OSPL-9208 | DDSI not sending an SPDP ping at least every SPDPInterval DDSI has a Discovery/SPDPInterval setting that is meant to set an upper bound to the SPDP ping interval, that is otherwise derived from the lease duration set in the //OpenSplice/Domain/Lease/ExpiryTime setting. The limiting only occurred when the lease duration is > 10s. Solution: The limiting has been changed to ensure the interval never becomes larger than what is configured. |
OSPL-9216 | Calling RMI C++ CRuntime::run() can cause deadlocks When RMI C++ CRuntime::run() is called when the runtime is already running, then the running state is detected and that call will leave the runtime immediately. However, it does so without unlocking a mutex. Further interaction with that runtime is likely to hit that locked mutex and thus deadlock. Solution: Unlock the runtime when a problem is detected during the run() call. |
OSPL-9240 / 16923 | Memory leak in entities with listener The listener administration shared by all language bindings, leaks a small amount of heap memory each time an entity is removed that has a listener attached to it. Solution: The issue was resolved by properly freeing admin data when a listener is detached from an entity manually or when the entity is removed. |
OSPL-9248 | Error while building FACE C++ example on windows 64 bit When building the FACE C++ example on windows 64 bit an error LNK2001: unresolved external symbol coming from the DataState class could occur. Solution: The defect in the DataState class has been fixed and the error will not occur anymore. |
OSPL-9250 / 16928 | RMI Java runtime stop can cause NullPointerException. By external interference of the RMI internals or specific timing, it is possible that RMI Java runtime stop() can throw a NullPointerException. Solution: Added various null pointer checks. |
OSPL-9291 | Possibly wrong durability master confirmation due to not waiting heartbeat expiry period. When a reconnect occurred it was possible that durability services confirmed a federation as master while the one they confirmed themselves confirmed a different federation as master. This would lead to durable data not being aligned until a new durability conflict is triggered. The cause of this problem was that the durability service should wait the heartbeat expiry period before confirming a master, however the wait did not occur. Solution: During master selection the wait heartbeat expiry period now works again. |
OSPL-9361 / 17014 | Incorrect IDL code generation for IDL that contains an array of a typedef which refers to a string type. For an IDL definition that contains an array of a typedef and the typedef refers to a string then the copy-in routines generated by the IDL preprocessor (idlpp) is incorrect. The string members are copied to a wrong memory location. Solution: The code that is generated by the IDL preprocessor for this array type is corrected. |
OSPL-9388 / 17030 | Durability service might deadlock when the networking queue is flooded. When the network queue is overrun by the durability service, the normal mode of operation is to sleep a bit and retry again later. However, there is a slight chance that the sending thread of the network service that needs to make room again by consuming elements in the queue will indirectly block on the sleeping thread in the durability service itself. Solution: The network service can no longer indirectly run into a lock that is held by the durability service while the network queue is flooded. |
OSPL-9413 | Memory leak in RnR service when using XML storage. The RnR service leaks memory for every data update when using XML storage. Solution: Free internal RnR service variable. |
OSPL-9441 | IsoCPP dds::sub::CoherentAccess destruction will throw an exception when end() was called on that object. The dds::sub::CoherentAccess destructor will end a coherent access. When the coherent access was already ended by a call to dds::sub::CoherentAccess::end() prior to the destruction of that object, the destructor will throw an precondition_no_met exception when it tries to end the already ended coherent access. An exception in a destructor can cause undefined behaviour (like a crash on VS14). Solution: Check if the coherent access is already ended before ending it again in the destructor. |
TSTTOOL-434 | Participant reader and writer tables - QoS column display and tooltip incorrect In the Browser tab, if the user navigated to a participant, the corresponding reader and writer tables did not display the qos values correctly. Solution: This regression was introduced in 6.7.0 and has been fixed in 6.7.1. |
OpenSplice v6.7.0
Fixed bugs and changes not affecting the API in OpenSplice 6.7.0
Report ID. | Description |
---|---|
OSPL-8709 / 16660 | Problem with triggered read condition when order access is used When using order access there is a possibility that a reader continues to receive data available events, but reading or taking results in no data available. This is caused by a disposed not being purged, causing invalid samples to retain in the data reader. Solution: When taking data, purge invalid samples in stead of retaining them in the reader. |
OSPL-8950 / 16756 | Memory increase of in-active RMI proxies An RMI proxy will receive replies (of the related service) meant for other proxies on other nodes. The middleware takes care of filtering the proper replies for the proxies on the local node. However, when the RMI proxy is in-active (it is created but not used to send requests), the replies are not cleared from the proxy reply reader memory. When the RMI proxy becomes active (sending a new request) all replies are taken from the reader and the memory is back to normal. However, the memory usage can increase on a node when the proxy is in-active. Solution: The proxy reply reader is always monitored and unrelated replies are removed from its memory. |
OSPL-8438 / 16569 | Builtin types not properly supported in ISOCPP2 When using any of the builtin types (types defined in dds_dcps.idl or defined in its included files) in your own IDL file (properly including the relevant IDL file in which it is defined), then the copy functions generated by idlpp would have compilation issues where referenced functions could not be found. Solution: The previously missing referenced functions have now been added to the ISOCPP2 library. |
OSPL-9122 | Error report during termination when LivelinessLost listener callback was set When using a LivelinessLost listener callback in your application it could happen that during termination of the application an error report "Internal error DataWriter has no Publisher reference" occurs in the ospl-error.log file. Solution: The defect is fixed and the error will not occur anymore. |
OSPL-9015 | Deletion of a reader after all group coherent transactions were received and read could lead to new transactions never becoming complete. When deleting a group coherent reader, which was part of a previous complete and read group transaction, it was possible that other readers in that group coherent subscriber would not be able to read a newly send transaction. The new transaction was never marked complete since it still expected data for the deleted reader. Solution: Updated the coherent administration so that the reader is not longer part of future group transactions. |
OSPL-8900 | Tuner - When writing data with the standalone writer frame, it is not possible to edit collections. Normally, in the reader-writer frame, right-clicking on a sequence user data field brings up a context menu to Add Details to the collection. The same action is missing from the standalone writer frame. Solution: The writer frame's table now has the ability to edit collections in the same manner as the reader-writer frame does. |
OSPL-8577 | Tuner - When writing data with the standalone writer frame, the data model is not properly initialized. When writing data in the standalone writer frame, accessed via the context menu option "Write data" on any Writer tree entity, clicking the Write button without first editing any of the default table values results in write error. Solution: The table's backing DDS data object is properly initialized with the table's values when a writer action is called. |
OSPL-8507 | Transactions with explicit registration messages never become complete. When doing begin_coherent_changes then register an instance and finally call end_coherent_changes the total transaction could fail because the register_instance is not always sent. Solution: The defect in the transaction mechanism is solved and registration messages are not properly sent. |
OSPL-8930 | The lease manager has to support leases which use the monotonic clock and leases which use the elapsed time clock. The lease manager is used to handle events that have to be processed at a certain time. For example the monitor the liveliness of the services the lease manager has to evaluate the corresponding leases at regular intervals. These leases are using the monotonic clock to support hibernation. However the lease manager is also used to monitor the liveliness of instances or when the deadline QoS is set to monitor if the deadline settings are satisfied. For these leases the elapsed time clock is used. Internally the lease manager used the monotonic clock. This may cause problems for leases which use the elapsed time clock when hibernation occurs. Solution: The defect in the transaction mechanism is solved and registration messages are not properly sent. |
OSPL-9171 | A Condition detach from a Waitset can block. If a thread is waiting on a waitset and another thread detaches a condition from that waitset, the waitset is triggered after which the detach can take place. However, if the first thread is very quick (or the detaching thread is slow), it can happen that the first thread already enters the wait again while the detach didn't detect the trigger yet. When that happens, the detach will block (at least) until the waitset is triggered again. Solution: Wait for a detach to finish when entering a waitset wait. |
OSPL-7980 | DDSI2 retransmits full sample on a retransmit request for the sample, even if the sample is huge The DDSI reliable protocol offers two different ways of requesting a retransmit of some data: a sample retransmit request and, for fragmented data (i.e., large samples), a fragment retransmit request. DDSI2 would always retransmit the full sample upon receiving a sample retransmit, even if that sample is huge, instead of retransmitting a "reasonable" amount and relying on further fragment retransmit requests. Solution: The DDSI2 service now retransmits a limited amount of data when receiving a retransmit request for a full sample. |
OSPL-8017 | DDSI2 did not renew a participant lease for every received message The DDSI2 service discovers remote participants and automatically deletes them if they do not renew their leases in time. The lease renewal was tied to reception of data and of explicit lease renewal messages, and hence reception of, e.g., an acknowledgement would not lead to a lease renewal, even though it obviously requires the remote participant to be alive. Solution: DDSI2 now renews leases regardless of the type of message. |
OSPL-8636 | Streams throughput example hang on windows When using the streams throughput example on windows the subscriber application could hang on termination. Solution: The defect is fixed and the subscriber will not hang anymore. |
OSPL-8660 | RMI cpp does not reduce thread pool size immediately. The threadPoolSize of the RMI cpp ServerThreadingPolicy can be reduced. The threads within the threadpool are quit after they handled a request task when the number of threads are larger then the required threadPoolSize. Because the request tasks are handled before the number of threads are reduced, it is possible for the Server to get more parallel calls then expected. Solution: Quit a thread within the threadpool until the number of threads are equal to the threadPoolSize. Then continue handling the request tasks. |
OSPL-8739 / 16698 | oo generic c_insert symbol The c_insert symbol is exported in one of the OpenSplice libraries. This is too generic and causes clashes with other software packages. Solution: The c_insert function has been renamed by prefixing it with the ospl_ prefix. |
OSPL-8828 | Possible bufferoverflow when using Google Protocol Buffers on Windows When using Google Protocol Buffers on Windows with Isocpp it can happen that the application crashes with a buffer overflow. This is due to a defect in the translation from gpb to dds datatypes. Solution: The defect is fixed and the overflow will not occur anymore. |
OSPL-9097 | DDSI transmit path can lock up on packet loss to one node while another node has crashed A successful retransmit to one remote reader while another remote reader that has not yet acknowledged all samples disappears (whether because of a loss of connectivity or a crash), and when all other remote readers have acknowledged all samples, and while the writer has reached the maximum amount of unacknowledged data would cause the transmit path in DDSI to lock up because the writer could then only be unblocked by the receipt of an acknowledgement message that covers a previously unacknowledged sample, which under these circumstances will not come because of the limit on the amount of unacknowledged data. Solution: deleting a reader now not only drops all unacknowledged data but also clears the retransmit indicator of the writer. |
OSPL-9096 | Durability service DIED message even though the durability service is still running The d_status topic is published periodically by the durability service to inform its fellows of its status. By using a KEEP_ALL policy, the thread writing the status message and renewing the serivce lease could be blocked by a flow-control issue on the network, which could cause the durability service to be considered dead by the splice daemon when in fact there was no problem with the durability service. Solution: use a KEEP_LAST 1 history QoS policy for the writer. |
OSPL-9067 | Large topics are sent published but not received Loss of the initial transmission of the final fragments of a large sample failed to cause retransmit requests for those fragments until new data was published by the same writer. Solution: ensure the receiving side will also request retransmission of those fragments based on heartbeats advertising the existence of the sample without giving specifics on the number of fragments. |
OSPL-9077 / 00016820 | Potential crash in durability service during CATCHUP policy The durability service could crash while processing a CATCHUP event. This crash was caused by the garbage collector purging old instances while the CATCHUP policy was walking through the list of instances to do some bookkeeping. Solution: The CATCHUP policy now creates a private copy of the instance list while the garbage collector is unable to make a sweep. This private list is then used to do the bookkeeping. |
OSPL-9068 / 00016813 | Catchup policy may leak away some instances When a node that performs a catchup to the master contains an instance that the master has already purged, then the node catching up would need to purge this instance as well. It would need to do this by re-registering the instance, inserting a dispose message and then unregistering this instance again. However, the unregister step was missing, causing the instance to effectively leak away since an instance is only purged by the durability service when it is both disposed AND unregistered. Solution: The durability will now both dispose AND unregister the instance at the same time. |
OSPL-9081 / 00016824 | Potential deadlock in the OpenSplice kernel The OpenSplice kernel has a potential deadlock where two different code paths may claim locks in the opposite order. The deadlock occurs when one thread is reading/taking the data out of a DataReader while the participant's listener thread is processing the creation of a new group (i.e. a unique partition/topic combination) to which this Reader's Subscriber is also attached. Solution: The locking algorithm has been modified in such a way that the participant's listener thread no longer requires to hold both locks at the same time. |
OSPL-9064 / 00016808 | Changes caused by OSPL-8914 in 6.6.3p4f4 have been reverted OSPL-8914 in the 6.6.3p4f4 release has made several changes to the durability service in order to solve problems where a rapid disconnect/reconnect cycle would leave the durability service in an undefined state. In these situations, a disconnect had not yet been fully processed when the reconnect occurs. However, the solutions provided in 6.6.4 caused other, previously non-existing errors during normal operation. Solution: All changes made as part of OSPL-8914 in the 6.6.4 release have been reverted. As an alternative solution to rapid disconnect/reconnect cycle issues, ddsi has offered temporary blacklisting of recently disconnected participants (see OSPL-8956). |
OSPL-8956 | Temporary blacklisting of remote participants in DDSI2 The DDSI2 service now provides an option to temporarily block rediscovery of proxy participants. Blocking rediscovery gives the remaining processes on the node extra time to clean up. It is strongly advised that applications are written in such a way that they can handle reconnects at any time, but when issues are found, this feature can reduce the symptoms. Solution: A new setting in the DDSI section of the configuration has been added: Internal/RediscoveryBlacklistDuration along with an attribute Internal/RediscoveryBlacklistDuration [@enforce]. The former sets the duration (by default 10s), the second whether to really wait out the full period (true), or to allow reconnections once DDSI2 has internally completed cleaning up (false, the default). It strongly discouraged to set the duration to less than 1s. |
OSPL-9071 | v_groupFlushAction passes a parameter that is not fully initialized. Valgrind reported that the v_groupFlushAction function passes a parameter that is not fully initialized. Although one of these parameters was evaluated in a subsequent function invocation, it never caused issues because the value was only used as an operand for a logical AND where the other operand was always FALSE. Solution: All attributes of the parameter in question are now explicitly initialized. |
OSPL-9055 | Potential Sample drop during delivery to a local Reader In some cases, a dispose followed by an unregister does not result in NOT_ALIVE_DISPOSED state on a Reader residing on the same node as the Publisher. In those cases, the Reader has an end state set to NOT_ALIVE_NO_WRITERS, and reports that a sample has been Lost. Solution: We have no clue what could cause this behaviour, but added some logging to capture the context of the erroneous sample drop. This is just a temporary measure, and will be reverted when the root cause has been found and fixed. |
OSPL-9056 | Potential deadlock during early abort of an application When an application aborts so quickly that the participant's leaseManager thread and its resendManager thread have not yet had the opportunity to get started, then the exit handler will block indefinitely waiting for these threads to exit the kernel. However, both threads are already blocked waiting to access a kernel that is already in lockdown. Solution: The constructor of the participant will not return before both the leaseManager and resendManager threads have entered the kernel successfully. |
OSPL-8953 | Potential deadlock between reader creation and durability notification A thread that creates a new DataReader and a thread from the durability service that notifies a DataReader when it has completed its historical data alignment grab two of their locks in reverse order, causing a potential deadlock to occur. Solution: The locking algorithm has been modified so that these two threads do no longer grab both locks in reverse order. |
OSPL-8886 | Durability failure to merge data after a short disconnect When the disconnection period is shorter than twice the heartbeat a durability service may not have been able to determine a new master before the node is reconnected again. In that case no master conflict is generated. In case the durability service is "late" in confirming a master it might even occur that the master has updated its namespace, but the namespace update is discarded because no confirmed master has been selected yet. As a consequence no request will for data will be sent to the master, and the durability service will not be aligned. Solution: In case a durability service receives a namespace update for a namespace for which no confirmed master is selected yet, the update is rescheduled for evaluation at a later time instead of discarding the update. |
OSPL-8914 | Durability failure to merge data after a short disconnect When a node becomes disconnected it may loose its master. As a result the node will look for a new master. In doing so, the node would first unconfirm its current master and then wait for other fellows to propose a new master. The time to look for a new master is specified in the configuration file (DurabilityService.Network.Heartbeat.ExpiryTime). When the disconnection was shorter than the DurabilityService.Network.Heartbeat.ExpiryTime, no merge is triggered. Solution: whenever a node is discovered that is not simply starting and it has no confirmed master, a merge is triggered, just like when there are conflicting masters. NOTE: This change has been reverted in 6.6.3p4f7. |
OSPL-8948 / 16755 OSPL-8987 | Race condition between durability data injection and garbage collecting of empty instances The durability service cached instance handles when injecting a historical data set in a way that could result in the historical samples being thrown away if the instance was empty and no known writers had registered it. Solution: the instance handle is no longer cached. |
OSPL-8971 | Catchup policy may incorrectly mark unregistered instances as disposed. When an instance is unregistered on the master node during a disconnect from another node that has specified a CATCHUP policy with that master, then upon a reconnect that unregister message will still be delivered to that formerly disconnected node. However, the reconnected node will dispose all instances for which it did not receive any valid data, so if the unregister message it the only message received for a particular instance, then its instance will be disposed. Solution:The Catchup policy is now instructed to dispose instances for which it did not receive any valid data OR for which it did not receive any unregister message. |
OSPL-8984 | DDSI handling of non-responsive readers needs improvement When a writer is blocked for ResponsiveTimeout seconds, DDSI will declare the matching proxy readers that have not yet acknowledged all data "non-responsive" and continue with those readers downgraded to best-effort. This prevents blocking outgoing traffic indefinitely, but at the cost of breaking reliability. For historical reasons it was set to 1s to limit the damage a non-responsive reader could cause, but past improvements to the handling of built-in data in combination with DDSI (such as fully relying on DDSI discovery for deriving built-in topics) mean there is no longer a need to have such an aggressive setting by default. Solution: The default behaviour has been changed to never declare a reader non-responsive and maintain reliability also when a remote reader is not able to make progress. The changes also eliminate some spurious warning and error messages in the log files that could occur with a longer timeout. |
OSPL-8920 | DDSI2 Crash Version 6.6.3p4 introduced a fix for OSPL-8872, taking the sequence number most recently transmitted by a writer when it matched reader into account to force heartbeats out until all historical data has been acknowledged by the reader. The change also allowed a flag forcing the transmission of heartbeats informing readers of the availability of data to be set earlier than before in the case where the writer had not published anything yet at the time the reader was discovered. While logically correct, this broke the determination of the unique reader that had not yet acknowledged all data in cases where there is such a unique reader. This in turn could lead to a crash. Solution: the aforementioned flag is once again never set before a sample has been acknowledged. |
Fixed bugs and changes affecting the API in OpenSplice 6.7.0
Report ID. | Description |
---|---|
OSPL-6636 | Invalid return code of offered- and requested-deadline-missed get status operations in classic C++ language binding The DataWriter::get_offered_deadline_missed_status and DataReader::get_requested_deadline_missed_status return an error code if the total count is 0. The status output-parameter is still filled correctly. Because there's no last_instance_handle when no deadline is missed yet, the language-binding would incorrectly determine the instance handle is invalid causing the error retcode. Solution: The check was fixed to allow for this special case. |
OSPL-6647 | Wrong status reset for on_data_available and on_data_on_readers When an on_data_available event is handled in the DCPS API, it should reset the data_available state before calling the listener. This is done, but it is done on the entity associated with the listener instead of the entity where the event originates from. This means for instance, that when a DomainParticipant receives an on_data_available event, it tries to reset the data_available state of itself instead of the related reader. The Subscriber has the same problem. In the DataReader, the entity is the source itself, so it isn't a problem there. The same happens when a on_data_on_readers event is received. But this has an additional problem that it resets the data_available status, while it should reset the on_data_on_readers status of the related subscriber. This problem exists on all DCPS language bindings. Solution: Fixed behaviour of on_data_available and on_data_on_readers to reset the status of the entity where the event originated. Also fixed on_data_on_readers so that it resets the correct on_data_on_readers status and not the on_data_available status. This was done for all API's. |
OSPL-7343 | Topics can be created with invalid names. Topics can be created with invalid characters in their names. However, the subsequent creation of DataWriters or DataReaders will fail when using such a Topic that has an invalid name. A topic name can consist out of the following characters: ‘a’-‘z’, ‘A’-‘Z’, ‘0’-‘9’ and ‘_’, but it may not start with a digit. Solution: The Topic creation fails when a name with invalid characters is used. |
OSPL-7560 / 15776 | When the TimeBasedFilterQos policy is used and the writer stops publishing after having published a series of samples, readers do not receive the latest state that was published after the minimum_separation delay has passed. The TimeBasedFilterQosPolicy can be used to control the rate at which readers receive samples. In particular, if there is a high frequency writer and receiving applications cannot keep up, then the TimeBasedFilterQosPolicy can be used to reduce the rate at which application readers receive the samples. In the previous implementation readers would not receive the latest state that was published in case the writer stops publishing after the minimum_separation delay has passed. Solution: Readers now receive the latest state that was published after the minimum_separation delay has passed if the instance has changed in the meanwhile and the reader's ReliabilityQosPolicy is set to RELIABLE. |