Regular releases of OpenSplice are made available which contain fixed bugs, changes to supported platforms and new features.
There are two types of release, major releases and minor releases. Upgrading OpenSplice contains more information about the differences between these releases and the impact of upgrading. We advise customers to move to the most recent release in order to take advantage of these changes. This page details all the fixed bugs and changes between different OpenSplice releases. There is also a page which details the new features that are the different OpenSplice releases.
There are two different types of changes. Bug fixes and changes that do not affect the API and bug fixes and changes that may affect the API. These are documented in separate tables.
This page lists all the fixed bugs and changes in the OpenSplice 6.8.x series of releases
Fixed bugs and changes in OpenSplice 6.8.x
Fixed bugs and changes not affecting the API OpenSplice 6.8.3
|OSPL-9967|| Possible crash when loading XML storage files in Durability service
There's a possibility that the Durability service crashes at startup. The root cause of this issue is that the buffer where the file name of the XML storage file is stored, is limited.
Solution: The defect of the limited buffer is fixed and is now dynamically allocated.
|OSPL-11113||Possible SignalHandler crash during termination of an application
The SignalHandler within an application listens to signals posted to an application (like Ctrl-C). When an application shuts down, the SignalHandler also shuts down. The application should therefor not accept any new signals. In the case that after the SignalHandler was terminated a new signal was 'posted' to the application, the application could crash.
Solution: When the SignalHandler of an application terminates, no new signals are accepted.
|OSPL-11094||Possibly missing persistent samples in persistent store.
If during startup of the durability service a persistent writer was created it was possible that data written by that writer was never stored in the persistent store. This happened when the writer created a group and the group existed before the persistent data reader, groupQueue, was created. The groupQueue only attached to groups via the NEW_GROUP event which could have been missed for already existing groups.
Solution: Made the persistent data reader, groupQueue, attach to already existing groups.
|OSPL-11101||The legacy master selection by durability may not converge when the master selection is based on majority.
When the durability service is configured to use the legacy master selection algorithm then depending on the situation the master selection algorithm may resort to selecting the master based on the master that is proposed as master by the majority of known fellows. Under certain circumstances (for examples fellows become disconnected temporarily) this may cause that some fellows select different masters.
Solution: When a durability services detects that another durability service has selected another master then a master conflict is raised which triggers an reevaluation of the selected master.
|OSPL-11114|| Possible memory leak in domain participant destructor in IsoCpp2.
Sometimes during closing a domain participant an "ALREADY DELETED" exception was generated. As this exception wasn't handled, further clean-up of the domain participant wasn't performed which causes a memory leak.
Solution: The exception is now handled and a proper clean up will be performed.
|OSPL-10769|| Threads that cause an error when accessing shared memory detach the shared memory before generating a core.
When a thread that is accessing shared memory causes an exception to be raised, then the signal handler will detach from the shared memory prior to dumping the core. That makes the core pretty useless in many cases.
Solution: The signal handler no longer detaches from shared memory when the thread that is raising the exception is accessing the shared memory at that moment in time. In all other cases the signal handler will still detach the shared memory in an attempt to leave the shared memory in a consistent state for other processes using the same federation.
|OSPL-10935||The X-Types specification provides a few simple built-in Topic types: BytesTopicType, StringTopicType, KeyedBytesTopicType and KeyedStringTopicType in the dds::core namespace. These types should be available within the IsoCpp2 API.
X-Types built-in Topic types are added to IsoCpp2.
|OSPL-11067|| No documentation for the create_persistent_snapshot() exists for IsoCpp2.
A proprietary API function exists to create a snapshot of the persistent store. The IsoCpp2 API implements this function, but no documentation for this feature was provided.
|OSPL-10715||Documentation for the create_persistent_snapshot() API function call is provided for IsoCpp2.
OSPL-10715 delete_contained_entities fails when the reader listener has an outstanding loan.
When using the Classic C++ API and using a DataReader with a listener for DATA_AVAILABLE it is possible that if during the listener callback the delete_contained_entities function is called it fails with PRECONDITION_NOT_MET if the listener callback does a read with a loan. The deletion of the reader did remove the listener interest but did not wait until the callbacks were finished.
|OSPL-10717||Unable to create readers for a topic starting with '_'.
Unable to create readers for a topic starting with '_', which is allowed in the DDS spec.
Solution: Added '_' as allowed first character in topic names.
|OSPL-10810|| Using nested idl files on windows for Classic C++ could cause build errors.
The idlpp/cppgen generated header files for Classic C++ contained an #undef of the import/export macro for Classic C++. When using nested idl files on Windows this could result in the import/exports not correctly set which could cause a build error.
Solution: Removed the #undef from the generated header file.
|OSPL-10301|| Error running vcredist when installing on Windows.
Any errors running vcredist on windows at installation time result in a warning prompt and final steps of the installation process are aborted. This occurs even if the error is because the vcredist pack is already installed. The cause of the error cannot be determined at installation time.
Solution: Do not display warning or abort installation process if there is an error running vcredist.
|OSPL-10746||In a single process deployment mode a deadlock may occur when an application installs a signal handler to trigger a guard condition.
The deadlock occurs when a signal interrupts a thread that is currently accessing the guard condition and invokes the signal handler, in that case the lock of the guard condition is already taken and the trigger operation will deadlock when it tries to take the lock again. This problem only exists in the single process deployment mode because in shared memory deployments the signal handling is always outsourced to a dedicated signal handling thread.
Solution: Enabled usage of the dedicated signal handler thread in single process deployment mode.
|OSPL-10771 / 18286||Group transactions may end up consuming a lot of memory.
When using group coherency features with a lot of writers in a group and a lot of non-overlapping transactions, the amount of memory needed to store redundant End Of Transaction messages (EOTs) may become quite big.
Solution: Remove the redundancy between duplicate EOT messages on the receiving side.
|OSPL-10860||When using Java based tools license checking didn't check all locations.
When using a Java based tool and the prismtech_LICENSE was set to an invalid/expired license the secondary license location (OSPL_HOME/etc) was never checked to see if it contained a valid license, instead only the tertiary (VORTEX_DIR/license) was checked. This could lead to the tool not starting with a license error while a valid license was present in the secondary location.
Solution: Made sure all license location are checked.
|OSPL-10929 / 18400|| Customlib IsoCpp2 Debug platform configuration crashes when using with an OpenSplice release build.
When compiling the IsoCpp2 customlib for a debug platform configuration and use this build with a release OpenSplice version if can crash due to a missing preprosessor define.
Solution: The issue is now fixed and the customlib for the Debug platform configuration is now generated properly.
|OSPL-10702 / 18253||Sequence of typedef'ed char results in incorrect data received when using ISOCPP2
The copy out route for sequences was incorrectly adding data to the end of the newly created sequence i.s.o the beginning. As the sequence was first initialized with zero, the end marker was placed after the initialized data, creating a sequence twice as long as needed with the first elements to be zero.
Solution: Instead of adding data to the sequence, assign is now used. This will place new data at the beginning of a sequence.
|OSPL-8465||Google Protocol Buffers embedding inner messages with keys multiple times fail
If you have keys in an embedded message and then try to use that message more than once in your top-level message, only the first occurrence will become part of the key.
Solution: It is now possible to embed inner messages that contain keys and they all will become part of the key.
|OSPL-10712|| Durability native state conflict after initial alignment could trigger second merge
When the master federation had a durability namespace with a merge state which was not equal to the default, a fellow durability service with a queued native state conflict resolved instead of discarded the conflict after the initial merge had completed. This resulted in same data being aligned twice.
Solution: After the initial merge is completed the namespace merge state on the fellow durability service is updated to match its master.
|OSPL-10924 / 18398|| CPP generated code results in an error messge when using klocwork.
When using CPP as language during idlppp generated code compiling using klocwork an error "there is a copy constructor but no assignment operator" can occur.
Solution: The problem is fixed and the correct assignment operator is now used.
|OSPL-9572|| Launcher failed to run in Windows XP 32 bit operating system environment.
Launcher would not run on Windows XP 32 bit systems. The java packager generated an exe file that only supported os verions for Windows 7 and up.
Solution: The .exe file was modified after generation to support Windows XP. Launcher is now supported for Windows XP 32 bit operating system environments.
|OSPL-9521|| Launcher: Java Tools buttons disabled the first time launcher is opened after installation.
The code doing the check of the JAVA_HOME was using a 1 second timeout. On the first call, it would timeout, and the check would fail.
Solution: The JAVA_HOME check code is done on a separate thread, then notifying the listeners (UI panes) that the check is complete.
|OSPL-10502|| Launcher: In the Tools tab, provide an RnR Manager button to open the RnR Manager product.
Launcher provides a Tools tab that allows users to open OpenSplice tools. If RnR Manager is installed, display a tools button to open RnR Manager.
Solution: Search for an RnR Manager installation in the expected default directory derived using OSPL_HOME. Check the version numbers, and select the highest version number. When an RnR Manager installation is found, in Launcher's Tools tab, display an RnR Manager button to open the RnR Manager product.
|OSPL-8069||Launcher: Make the file and directory choosers smarter in the Settings dialog / environment tab.
The file and directory choosers had no logic for the initial directory when choosing a file or directory.
Solution: The file and directory choosers use the current value to set the choosers' initial directory. Smarter initial directory defaults are also provided if applicable.
|OSPL-10523|| Tuner: Actions should automatically refresh tree views.
After user initiated actions, the user has to then also initiate a "Refresh entity tree" action. Make the refresh happen automatically where possible.
Solution: For the actions: "Import data" and "Import metadata", the entity trees are automatically refreshed.
|OSPL-10513||Tuner: Should provide an option to filter out internal topics.
Tuner always displays all topics, including specification topics (DCPS), and OpenSplice product internal topics (CM, d_, q_, rr_).
Solution: Added a Topic Filters tab in the Preferences dialog. This tab allows users to filter which topics are displayed in the tree views. By default all DCPS topics are shown, and all OpenSplice internal topics are hidden (CM, d_, q_, rr_).
|OSPL-10667|| Tuner: Default partition empty string displayed as "null" in QoS tab.
The cmapi internally stores the default partition policy value as literal java null. So in Tuner, when it sees the partition value null, it would show use the string "null" as the value in the QoS view table.
|OSPL-10345 / 17884 / 17883 / 17882 / 17880|| Added filtering for server ID to Dynamic network configuration for RT Network service.
To effectively utilize dynamic network configuration updating, filtering on server ID needs to be added. Always updating all OpenSplice instances on a network with a new network configuration does not provide practical functionality.
Solution: The new filtering has been added to the RT Network service.
|OSPL-10371 / 17881||New network partition does not result in updated partition mapping for RT Network service.
When dynamically updating an RT Networking network configuration, adding a new network partition does not result in the proper updating of the partition mappings. The partition mappings need to be updated prior to notifying the channels of the new network partition and mappings. The symptom of this problem is that the new network partition is added, but a topic message is sent over the global network partition rather the new network partition.
Solution: The defect is now fixed and a new network partition now results in an updated partition mapping.
|OSPL-10396|| Idlpp crashes when compiling a sequence of a typedef to a sequence to C or C99.
The IDL compiler idlpp crashes when it needs to generate the C or C99 representation for a sequence of a typedef to a sequence, or it generates code that doesn't work correctly.
Solution: The idlpp code has been modified to generate the correct C/C99 representation and to no longer crash during this generation process.
|OSPL-10576|| Leakage measurements on single process applications show a lot of heap leakage.
Most of the heap leakage found is allocated by the internal database. In single process deployment the database allocates data on heap instead of in a separate shared memory storage as within the shared memory deployment, meaning that destruction of the database itself doesn't automatically free all remaining allocated data. This leakage is in fact not a resource problem but makes it hard to identify real issues. Leakage measurements on single process applications show a lot of heap leakage.
Most of the heap leakage found is allocated by the internal database. In single process deployment the database allocates data on heap instead of in a separate shared memory storage as within the shared memory deployment, meaning that destruction of the database itself doesn't automatically free all remaining allocated data. This leakage is in fact not a resource problem but makes it hard to identify real issues.
Solution: The database shall free all remaining data objects on destruction.
|OSPL-10659||When a node is rebooted persistent data may be inserted as unregistered even if there is no evidence that the original publisher has left.
When a node that has configured persistency is rebooted and the data in its persistent set is the same as the set that would otherwise be provided by the aligner, then it is more efficient to load the data from disk than requesting the data form the aligner. Currently, the persistent store always assumes that data published from the persistent store is unregistered. Since the unregistered state is taken into account when calculating a hash over the set, hashes are likely to differ causing alignment. To prevent this situation the injected data should marked as implicit, which also better matches the semantics that the injecting node deduces that there is no writer.
Solution: When messages are injected from the persistent store they are injected as being implicitly unregistered instead of explicitly unregistered.
|OSPL-10688|| Simulink: Reader block may miss samples when configured to wait for available data.
A Reader block configured to wait for available data always creates a wait set in every step, and only reads samples if the wait set is triggered. This implementation misunderstands the nature of the triggering of the 'data available' status condition. Data available is only triggered when DDS places new data into the reader's local buffers. It is possible for a Reader block to wait unsuccessfully for available data, but still have unread data locally available.
Solution: The Reader block has changed to do the following: 1) attempt to read or take (as configured); 2) if no samples are returned, wait for available data (if so configured); 3) if available data is triggered in step 2, then read or take samples (as configured).
|OSPL-10689|| Simulink: Writer block does not write samples in step in which it waits for a publication match.
A Writer block configured to wait for a publication match will not write samples in the same step in which it waits for a publication match. This may result in an application failing to write samples some samples.
Solution: The Writer block was changed to write after waiting for a publication match, so long as a match was found or the block was configured to write even if a match was not found.
|OSPL-10699 / 18251|| Network service failed to start due to missing system clock.
There is a mis-match between the platform configuration specified by the tool chain and what the platform actually supports. This causes clock_gettime to return an error code. As there is no error checking on this call, a fail will not be detected and the clock isn't working.
Solution: In case the call fails because of using the BOOT_CLOCK, retry the call using MONOTONIC clock.
|OSPL-10756||Simulink: Compilation errors in Simulink Coder concurrent execution models when using Reader block.
Simulink allows you to configure a model for 'Concurrent Execution'. The basic requirement for this is that each thread of execution is represented in a separate Simulink model, and then these models are referenced from a top-level Simulink model using the 'Model' block. The top-level model is then configured for concurrent execution. The details of doing this are described in the following MathWorks article: Configure Your Model for Concurrent Execution. When such concurrent models are built into executables with Simulink Coder, compilation errors result if the DDS Reader block was used in any of the referenced models. The error indicates that the include file "dds.h" cannot be found, and occurs as Coder attempts to compile the C code for the top-level model.
Solution: The generated code for the Reader block has been changed so that "dds.h" is no longer required when compiling the top-level model.
|OSPL-10865||Durability services fail to align in Device-to-Device deployment with Cloud.
Durability doesn't consider fellows behind a Cloud service responsive because it fails to match a fellow's readers with that fellow's id. The readers and writers are properly discovered and data will flow.
Solution: The issue is now fixed and the durability service will align when using the cloud service
|OSPL-10954 / 18403|| On the classic C++ API the registration of an already registered typesupport leaks memory.
When a typesupport is registered then the existing typesupport collection is searched to find if the typesupport is already registered. When an already registered typesupport is found it reference count in increased but not released after being used. This causes a memory leak.
Solution: The found typesupport is released after being used.
|OSPL-10967|| Issues with sequence/array of enums and multi-dimensional arrays in C#.
The IDL compiler idlpp generates incorrect C# marshaling functions for the following IDL constructions:
arrays or sequences of enums
multi-dimensional arrays in general where #dimensions > 2
The issues cause either idlpp to crash when generating the code, the marshaler to crash when executing the code, or the wrong data to be written into or read out of the system.
Solution: The idlpp code used to generate the C# marshalers has been corrected and no longer suffers from these issues.
|OSPL-11083|| C# compilation failure in PingPong example prior to .NET 4.0.
The PingPong C# example calls Stopwatch.restart in pinger.cs. The stopwatch.restart method was not added until .NET 4.0
Solution: The problem is fixed and the restart call is replaced with a call that works in .NET 3.5
|OSPL-11020|| Identical sample requests from the same alignee are combined by an aligner durability service.
A durability service can combine sample requests to optimize alignment efficiency. When two sample requests for the same data originate from different alignees the sample requests should be combined. However, when two identical sample requests originate from the same alignee it makes no sense to combine them. Combining them does not lead to a inconsistent state, but it may cause that other nodes receive such.
Solution: Before actually combining the request, the aligning durability service checks if it already has a pending request that addresses the requesting alignee. If so, the request is not combined.
|OSPL-10871||The unregister resulting from the delete of writer may be ignored on systems with low time resolution.
When a writer is deleted directly after writing an instance for the first time and the clock resolution of the system is very poor then it may occur that the unregister message resulting from the deletion of the writer gets the same timestamp as the sample and the sequence number of the unregister message is set to 0. This cause that the unregister message is considered older and is ignored.
Solution: The sequence number of the unregister message is set the maximum value which cause that the unregister message is processed correctly.
|OSPL-11087||Issues with unions having a sequence of char or sequence of boolean in C#.
When using idlpp to compile a union that has a branch with a sequence of char or a sequence of boolean, the C# backend either crashes, or generates code that does not compile properly.
Solution: The C# backend for idlpp now generates the correct code for handling these cases and no longer crashes on them.
|OSPL-11131||Idlpp backend for C# may generate incorrect marshaling code for sequences
The C# Marshalers generated by idlpp do not always contain the correct way to obtain the type description for attributes of type sequence: the function call that looks up the type description uses the C# attribute name rather than the IDL attribute name that is used to index the types. In most cases both names are equal, and so there will be no impact, but when both names differ, the Marshaler may crash with a System.NullReferenceException when writing samples. The IDL attribibute name and its C# counterpart may differ when the IDL name is based on a C# keyword (in which case its C# representation is prefixed with an underscore) or when the idlpp option "-o custom-psm" is used, in which case C# attribute names may be modified using the PascalCase notation.
Solution: Idlpp now always uses the IDL attribute name when looking up a type descriptor.
|OSPL-10728||The use of inline arrays of pointers to structured types that have an alignment not equal to 4 bytes on a 32 bits platform may cause data corruption or crashes.
The problem is caused by an incorrect alignment calculation for arrays of pointers. The array alignment should be equal to the alignment of the pointer (4 bytes on a 32 bits platform) but is actually equal to the alignment of the pointed type. As a result the memory offset of the array can be incorrect causing data corruption or crashes.
Solution: Fixed the alignment calculation.
|OSPL-10795||Build error when building from source on Windows with Visual Studio 2015 update 3.
The IsoCpp2 API could not be built on Windows with Visual Studio 2015 update 3 because of a dependency on the in Visual Studio 2015 update 3 not set _HAS_CPP0X. This caused C++11 to be disabled for some parts.
Solution: Removed the _HAS_CPP0X dependency for Visual Studio 2015 and later for enabling C++11 support.
|OSPL-10818 / 18358||Typedef of a Sequence of a struct generation for C99 is faulty.
When using C99 and an idl which contains a typedef of a sequence of a struct the generated code by idlpp omits the definition of the struct.
Solution: The problem is fixed and a correct definition is now generated by idlpp.
|OSPL-10958|| Writing samples that contain a sequence of a primitive type in C# could lead to an application crash.
When using C# and having a structure in IDL that contains a member of type sequence(octet, long, double, float), A write of this data could lead to an application crash due to wrongly generated code by idlpp.
Solution: The problem is fixed and correct C# code is now generated by idlpp.
|TSTTOOL-485|| Tester created readers unable to receive historical data samples from transient durability topics.
An error in the way Tester was creating data readers resulted in historical transient data to never be read and displayed in the sample list.
Solution: The error is fixed and historical data is visible in the sample list on reader creation once again.
Fixed bugs and changes affecting the API OpenSplice 6.8.3
|OSPL-10922|| Implicit IsoCpp2 Participant, Publisher and Subscriber.
A Participant has to be available before able to create a Topic for example. For many usecases, this is just a default Participant. The API would become simpler when a default Participant is assumed when no one was provided while creating the Topic. The same is applicable for Publishers/DataWriters and Subscribers/DataReaders.
Solution: A default (singleton) Participant is created implicitly when dds::core::null is used when creating a Topic. An implicit Publisher is created when dds::core::null was provided when constructing a DataWriter (using the Participant of the given Topic). An implicit Subscriber is created when dds::core::null was provided when constructing a DataReader (using the Participant of the given Topic).
|OSPL-10936|| Simplify creation of transient reliable DataReaders and DataWriters in IsoCpp2.
To create transient reliable DataReaders and DataWriters you have to create the right QoSses with the proper policies and a transient reliable Topic. Many usecases need a simple transient reliable communication. By simplifying the creation of transient reliable Entities, these usecases can be simplified as well.
Solution: The org::opensplice::topic::qos::TransientReliable() convenience function is added to IsoCpp2. Topics created with the resulting QoS will be transient reliable. DataReaders and DataWriters created using such a Topic are transient reliable as well automatically.
Fixed bugs and changes not affecting the API in OpenSplice 6.8.2
|OSPL-10489|| Reader port removed from Vortex DDS Reader block for Simulink
The optional 'reader' port on the Vortex DDS Reader block for Simulink served no purpose.
Solution: The port has been removed. Existing models that include a reader port will have the port removed. Any connectors connected to the reader port must be removed.
|TSTTOOL-474||Python Scripting: Exception on exit
In the Python scripting environment (osplscript), creating a 'Recorder' (osplscript.recorder.Recorder) that is never used will result in an error on exit indicating a java.lang.NullPointerException.
Solution: Guard code has been introduced to ensure that clean-up code is called only when needed.
|TSTTOOL-470|| Tester cannot write enum values whose labels start with its type name.
In some cases it was possible for the Tester to fail to write certain samples of certain types. The triggering type of data is an enumeration type whose label names start with the name of the enum type. When encountered, that type name string is cut out of the outgoing data value for the enum, which results in a failure because the middleware no longer recognizes the enum value it is trying to write.
Solution: Tester no longer strips out the type name from the enum label.
|OSPL-10499|| Launcher Configurations tab User Interface Improvements
In the Launcher "Configurations" tab, the purpose of a configuration may not be clear to a user. Also, some of the functionality in the tab may not be intuitive to a user.
Solution: Changes were made to the "Configurations" tab to address usabilty. These include: adding descriptive text, adding a link to the deployment guide, highlighting the ACTIVE configuration, disabling buttons when they are not applicable, a new right click menu for the configuration selection, and changing double click functionality to open a configuration instead of setting ACTIVE.
|OSPL-10497|| Launcher Tools and Controls tabs should explain why certain options are greyed out and disabled.
In the Launcher "Tools" and "Controls" tabs, access to tools and controls is done using buttons. In certain conditions, these buttons are disabled and shown as greyed out. It may not be clear why they are disabled.
Solution: Tooltips were added on mouse hover to give users an explanation as to why the buttons are disabled
|OSPL-10496|| Launcher Tools and Controls tabs - provide a description of the tools and controls.
In the Launcher "Tools" and "Controls" tabs, access to tools and controls is done using buttons. It may not be clear to the users what these tools and controls are for.
Solution: Tooltips were added on mouse hover to give users a description of button functionality.
|OSPL-10487|| Error Dialog from DDS Topic block for Simulink
In MATLAB/Simulink R2017a, Simulink models using the Topic block from the Vortex DDS Block Set could see an error dialog with the following text "Error evaluating 'MaskDialog' callback of SubSystem block (mask)". The error arises from API changes between MATLAB R2016b and R2017a.
Solution: Solved the problem by correctly calling the revised APIs.
|OSPL-10486|| Vortex DDS block labels displaying incorrectly in MATLAB R2017a
In MATLAB/Simulink R2017a, Simulink models that use the Vortex DDS Block Set blocks, and that hide one or more of the optional block ports, question marks (???) were overlaid on the block icon. Note that this behaviour did not impact execution of blocks.
|OSPL-8573|| Tuner - Various errors and exceptions thrown through regular UI actions.
Certain UI actions in the Tuner tool consistently cause exceptions to be printed to the console, and the expected UI action to fail. Particularly the Export Data menu items in the main window and some child windows.
Solution: The causes of the exceptions being thrown have been fixed, and the corresponding UI menu actions work properly.
|OSPL-4705|| Launcher preferences need to be stored in a separate location for each OpenSplice installation.
Launcher stored all preferences in a .olauncherprefs file in the user home directory. If a user has multiple installations of OpenSplice, Launcher will reuse that file for all installations. This caused some confusion. (ex. licensing, resetting OSPL_HOME)
Solution: The OSPL version was appended to the .olauncherprefs file name, so that each installation will have its own launcher preference file. For example: .olauncherprefs6.8.2. If the OSPL_HOME variable is not set, the preferences file name will default to .olauncherprefs.
|OSPL-10595|| Error when compiling ISOCPP2 with Visual Studio 2015 Update 3 or higher
When trying to compile the ISOCPP2 library, or any ISOCPP2 based application using Visual Studio 2015 Update 3 or any higher version, several compilation errors are encountered.
Solution: The compilation flags required to successfully build on Visual Studio 2015 Update 3 and higher are now added to the build system.
|OSPL-10366|| Unclear when persistent snapshot if successfully created.
The function create_persistent_snapshot on the Domain object operates asynchronously: a thread in the durability process is responsible for making the snapshot, and the function create_persistent_snapshot returns immediately after the durability service has been instructed to make the snapshot. Therefore, it is unclear to the caller when the snapshot has safely been flushed to disk.
Solution: The durability service will now log a message to the ospl-info.log when it successfully flushed the snapshot to disk. This log references a unique sequence number for each snapshot and can easily be made available for inspection by an application by configuring a ReportPlugin in your OpenSplice config file.
|OSPL-10645||The AutoBuiltinTopic namespace created by the durability service uses the legacy master selection algorithm which may cause problems when the others use the priority master selection algorithm.
When the durability service configuration contains a namespace which includes the builtin topics and it is not configured as an aligner for that namespace then the durability service will create an AutoBuiltinTopic namespace for the builtin topics. However this namespace always uses the legacy master selection algorithm. When another durability service is configured as aligner for a namespace containing the builtin topics and which uses the priority master selection algorithm a mismatch between those namespaces is detected and reported.
Solution: When the durability service creates the AutoBuiltinTopic namespace it should used the configured master selection algorithm.
|OSPL-10644||CM java API Entity toString could return null
The CM java API could return null when the Entity.toString function was called for an entity without a name. When trying to print an Entity this function was called but not all jvm implementations are able to handle a null string which could lead to a NullPointerException.
Solution: When toString() is called on an Entity with no name the simple name is returned.
|OSPL-10641||Incorrect log messages when bringing down network interface
Incorrect log messages occur when using the RT Networkservice and having multiple virtual adapters configured and bringing one of the virtual devices down. In the ospl-info log it is reported as that the main adapter went down which is not the case.
Solution: The incorrect adapter name has been fixed and the log now only reports information when changes occur on the actual used adapter by the RT networkservice.
|OSPL-10596||In case a durability service receives many unrequested chains then its log file could get polluted
When a durability service combines sample requests it will send the combined answer to all durability services. The answer is only intended for the durability services that requested for the data, all other will drop drop the data. In case a durability service receives such unrequested answer this results in a log message for every message that is received. In case there are many unrequested answers this lead to many message and hence polluted the durability log file.
Solution: The line that was responsible for log pollution has been removed
|OSPL-10590|| ospl tool might miss spliced termination
The ospl tool has a 2 second period it initially waits for the splice daemon. When the splice daemon terminated within those 2 seconds because of a normal termination request the ospl tool missed the termination and kept waiting for maximum time of 60 seconds before exiting. Could only happen when ospl start and ospl stop where called form different threads.
Solution: ospl tool is now aware of normal termination within 2 seconds.
|OSPL-10585||Writing samples that contain a sequence of a primitive type in C# does not work
When using C# and having a structure in IDL that contains a member of type sequence(octet, long, double, float), the C# writer can’t write the content of the sequence.
Solution: The problem is fixed and a writer in C# can now write the sequences.
|OSPL-10573||Unable to create a shared memory segment larger than 1.9 GB
SetFilePointer is used to set a pointer in memory. Only the low pointer is used, limited the range to 2 GB
Solution: SetFilePointer also accepts a (optional) second 32 bit value which can be used to access file locations beyond the 2GB barrier. The code have been changed to used this 2nd parameter.
|OSPL-10571|| OpenSplice refuses to start when using a decimal human readable size value in the configuration
When having a configuration with a decimal human readable size value i.e. a database size set to 0.2G OpenSplice refuses to start.
Solution: The fault in the configuration checker was fixed and now decimal human readable sizes are accepted.
|OSPL-10563||When the master selection algorithm is unable to select a master in time it chooses a master based on majority voting without generating a warning. Also, the threshold to resort to majority voting was ridiculously high.
When there are multiple durability services in a system these durability services will have to choose a master for each namespace. The master is the one that is responsible to align the other other durability services. In case of the legacy master algorithm several iterations are being carried out if there is disagreement who should be the master. If after a fixed number of rounds there is still no agreement then majority voting is used to force a master. Because this situation may lead to improper alignment the fact that majority voting is used justifies a warning in the log files. However, this situation was not logged.
Solution: A warning will be printed in the ospl-info.log when majority voting is used to choose a master, and the threshold to decide when majority voting will be used has been reduced.
|OSPL-10534||ddsi2(e) may crash due to potential corruption of internal hash table.
In certain scenario's the ddsi2(e) service may crash due to corruption of an internal hash table. This is caused by the hash table not resetting slot pointers when an entry needs to be moved to another slot. In most cases the bogus pointer will quickly be overwritten by another entry and no harm is done. However, if this is not the case and the original entry is removed, a dangling pointer remains in the hash table which may crash ddsi2(e).
Solution: The slot pointer is now reset to NULL after an entry is moved to another slot.
|OSPL-10522||Tuner import of transient data isn't received by late joiners
When the tuner imports transient data it is not received by late joining readers because the imported data is published by a writer with auto_dispose_unregister=true and is thus purged from the system.
Solution: Tuner publishes imported data with a writer with auto_dispose_unregister=false
|OSPL-10465|| ospl start terminates splice daemon before it has the chance to become operational
the ospl tool had a hardcoded 10 second timeout in with the splice daemon had to become operational, if not it would terminate the splice daemon and report an error. In some occasion this timeout had proven to be to short.
Solution: Increased the timeout to 60 seconds, this is a worst-case value in most situation this won't effect behavior.
|OSPL-10462||NullPointerException during listener termination.
When using the Java API with listeners a NullPointerException can occur when deleting an entity which has a listener attached.
Solution: The cause of the fault is still unclear but a workaround has been implemented to catch the exception so the listener will terminate correctly. When this happens a trace will also be added to the ospl-error.log
|OSPL-10423|| Sending out an array of a typedef of a string in C may cause an application crash
When you model an array of a typedef of a string in your IDL and generate C code from this model, then the resulting code may corrupt/crash your application when you try to write samples containing such an array due to incorrect dereferencing of its pointer.
Solution: Correct operator precedence has now been enforced by explicitly using brackets in the generated code.
|OSPL-10425||Possible error return by ospl tool while splice daemon is started
When the ospl tool started the splice daemon and the starting took longer then the maximum allowed start time the ospl tool returned an error while the splice daemon was still running.
Solution: When the maximum allow start time has passed the ospl tool now kills the splice daemon and reports an error.
|OSPL-10423|| For sample attributes of type array only the first 4 or 8 bytes are actually marshaled out in ISOCPP.
In ISOCPP (not ISOCPP2), sample attributes of type array are only marshaled out using the first 4 bytes (for 32-bit platforms) or the first 8 bytes (for 64-bit platforms), the rest of the array may be filled with garbage. This might cause data corruption or even a crash of the marshaling algorithms on either the sending or the receiving side. This was caused by an incorrect way to establish the size of the array, which returned the size of the pointer to the array instead of the size of its content.
Solution: The algorithm now correctly establishes the size of an array.
|OSPL-10420||Wait for historical data timeout when using multiple readers and transient local with DDSI2(E)
When using the DDSI2(e) network service with transient local durability set on a topic it can happen that in the situation when creating 2 nodes where 1 node is the writer and the other node creates 2 or more readers that wait for historical data call on the secondly created reader returns timeout when there is no data in the system for that topic.
Solution: The problem is fixed and when this scenario happens the created readers will not return timeout anymore when there is no data for them.
|OSPL-10413||When using delayed alignment the durability service it may occur that a master conflict is not resolved.
When the durability service configuration contains a namespace which is configured to use delayed alignment it may occur that the master selection for that namespace does not converge. The cause for this problem is that when delayed alignment is used the master selection does not take into account the system id and the quality of the namespace. This causes each time a durability instance detects a master conflict it will first select it self as the new master. This may cause a master conflict at another durability instance. This process repeats it self and results that no final master is selected.
Solution: In case of delayed alignment the master selection algorithm should use both the quality of the namespace and the system id in to account when selecting a master.
|OSPL-10382||Missing metaconfig error message logged when running iShapes demo
The metaconfig.xml file is missing from the iShapes demo installer.
Solution: Added the metaconfig.xml file to the iShapes demo installer and added a priority search location for it in the current directory
|OSPL-10376||Wrong display of octet in Tuner
When writing an octet field with the Tuner it is not possible to writer values > 127.
Solution: The problem is fixed and the Tuner can now write values for octets from 0 to 255.
|OSPL-10362|| NullPointerException in the Tuner.
When using the Tuner to read a sample which contains a union inside a nested structure a NullPointerException could occur.
Solution: The Tuner cannot handle some cases where a union is inside a nested structure as a workaround the NullPointerException is being catched so the Tuner wont crash and the sample can be read only the value of the union wont be displayed correctly in such a case.
|OSPL-10361||Google Protobuffer meta data injecting not efficient on isocpp2
Google Protobuffer meta data was generated into a hex string and runtime translated into a byte array using the sscanf function, for big types this were many sscanf calls.
Solution: The Google Protobuffer isocpp2 meta data is now generated in a byte array so no runtime translation is required.
|OSPL-10360||On vxworks a condition variable may remain blocked after being signaled.
On vxworks for each thread that uses a shared condition variable a named semaphore is allocated that is used to wake up this thread when it waits on a shared condition variable. When a thread starts waiting on a shared condition variable it registers this thread specific semaphore with the condition variable and waits on this semaphore. When another thread signals the condition variable it will post the semaphore that is registered with the condition variable to wake up the corresponding thread. The name used for this thread specific semaphore should be unique. However threads created withi the same RTP could generate the same name for this semaphore which could cause that the wrong semaphore is released when triggering a condition variable.
Solution: The name generated for the named semaphore which is used for each thread is determined atomically within a RTP.
|OSPL-10359|| Fault in QoSProvider schema.
The provided DDS_QoSProfile.xsd contains an element dds_ccm this element is not suitable for OpenSplice and needs to become dds. The dds_ccm is introduced by an inconsistency in the OMG spec itself.
Solution: The element changed from dds_ccm to dds fixing the inconsistency.
|OSPL-10358|| Possible stack overrun when returning the loan after reading a large amount of instances
When having a large amount of instances and do a single read it is possible to get a stack overrun when returning the loan after a read.
Solution: The internal free of the loans was changed to behave iterative in stead of recursive.
|OSPL-10188||Memory leak in flex/bison generated parser
Memory leak in flex/bison generated parser in path to make it thread safe.
Solution: Removed memory leak by making parser reentrant (requires bison version >= 2.7)
|OSPL-10174||Master selection algorithm takes too long to decide on master
The legacy master selection algorithm (which is selected by default and which should be used when communicating with nodes that run < V6.8.1) may sometimes take too long to decide on who becomes the master. When a node needs to confirm its vote, it resets its master selection in the mean time, resulting in a non-resolved conflict and another master selection cycle until finally majority voting finally kicks in. However, all these extra voting cycles may cause increased alignment times.
Solution: The master selection is no longer reset between the establishment of the master and its confirmation.
|OSPL-10169||The RT networking service does not use active garbage collection of buffers used by a best-effort channel.
The networking packets (fragments) received by a RT networking channel are stored in defragmentation buffers. When messages are fragmented over several packets and when a packet is lost the best-effort channel will maintain packets it has already received because packets may arrive out of order. However the networking service does not apply active garbage collection of these buffers. It will apply garbage collection when the number of free buffers becomes low. Because no active garbage collection is applied to the buffers of a best-effort channel the number of used buffers my increase considerably when large number of packet loss occurs on the network.
Solution: For a best-effort channel apply garbage collection of defragmentation buffers containing incomplete messages at regular intervals. The garbage collector will free defragmentation buffers that contain incomplete message when a threshold is exceeded. This allows that packets that arrive out of order can still be delivered to the application.
|OSPL-10168||Ownership transfer may fail when using a deadline.
Ownership transfer occurs when an instance is unregistered, or when it misses its deadline. When a disconnect occurs to an instance that also has a deadline applied to it, and the deadline expires before the loss of liveliness is detected, then the deadline mechanism is the first to transfer the ownership away from the disconnected owner. However, when later the loss of liveliness is also being processed while no new ownership has been claimed in the mean time, then the unregister message that communicates the loss of liveliness will reclaim ownership for the now disconnected owner. Lower strength writers that try to transmit messages afterward will never be able to claim ownership over the instance anymore, and thus will not be able to deliver their data.
Solution: Unregister messages will no longer be able to claim ownership for instances that currently have no owner.
|OSPL-10157|| Coherent historical EOT purging inefficient (high load)
In a system with a large number of transient/persistent topic/group coherent writers the EndOfTransactions (EOTs) purging could take a long time as it was implemented as a loop within a loop with reference checking which was called often called both synchronous and asynchronous.
Solution: Simplified the purging by removing the loops and making it synchronous only
|OSPL-10143|| Creating a group coherent writer in the default partition always created an error log message
When creating a transient/persistent group coherent writer in the default partition an error message was logged stating: " is in zero configured nameSpaces" even when it was in a nameSpace.
Solution: Updated the group coherent partition nameSpace matching so that it correctly handles the default partition
|OSPL-10139||The isocpp2 read/take operation using a selector in combination with an iterator returns the incorrect number of samples
The isocpp2 read/take operation using a selector in combination with an iterator ignores the provided max_samples parameter which causes the operation returns the incorrect number of samples. Instead of using the provided max_samples parameters it uses the max_samples attribute set on the selector.
Solution: The provided max_samples parameter is used to limit the number of samples returned.
|OSPL-10113|| Merging with an empty set could lead to a state update even when a data set was not changed. This triggers unnecessary alignment and may cause resurrection of data.
In case a durability service has configured a MERGE policy and it merges with an empty set, then the durability will still publish a state update of its set even though its state has not changed. The state update in turn may cause other nodes to initiate alignment and hecne cause resurrection of data that was already aligned an taken before. Evidently, using the MERGE policy with an empty set does not change its set, and no state update should be generated.
Solution: When a MERGE policy is applied with an empty set no state update is generated any more.
|OSPL-10101||Possible hang in delete_domainparticipant when splice daemon would not join in singleprocess mode
If the splice daemon thread (only singleprocess) would not exit the join on it would block indefinit
Solution: The join on the splice daemon is conditional, when unjoinable an error is returned.
|OSPL-9706||Unattended installer ignores providedLicenseFile argument
When doing an unattended install of OpenSplice and providing a valid providedLicenseFile argument. The installer ignores the argument.
Solution: The problem is fixed and the installer now uses the given providedLicenseFile argument to set the license file.
|OSPL-7563||The Custom_Lib solution files for MS Windows do not contain a Debug configuration.
To build the various custom_lib's on MS Windows solution files are included. These solution files contain only a Release configuration. To support the debugging of customer applications a Debug configuration should be added to the provided solution files. This allows the customer to build a debug version of the custom libraries.
Solution: A Debug configuration is added to the custom_lib solution files. The library name of the debug version of the custom lib is extended with the letter 'd'. For example the debug version of the standalone C++ API will be called dcpssacppd.dll instead of dcpssacpp.dll. The same applies to the other custom libraries.
|OSPL-10343 / 17878|| The durability service of a node that is configured as a non-aligner for a namespace may not become complete even if there is an aligner available.
In case a durability service of node is configured as a non-aligner for a namespace is initializing and it encounters an aligner that is already complete, then the non-aligner may have missed the message where the aligner indicates that its group are complete. Even though the non-aligner notices that it needs to request the state of the groups from the aligner, this is blocked by the non-aligner waiting for all its groups to get complete. This effectively may lead to a situation where the durability service does not make any progress.
Solution: In case the non-aligner is not complete any new events are not being blocked anymore. This causes the non-aligner to request the groups when the aligner appears, notices that they are all complete, and proceeds again.
Fixed bugs and changes not affecting the API in OpenSplice 6.8.1
|OSPL-8444||Unable to set the "Don't fragment" bit on outgoing UDP packets
No configuration option to set this bit.
Solution: Added a configuration option "DontFragment", located at (S)NetworkingService/Channels/Channel/Sending and NetworkingService/Discovery/Sending
|OSPL-9882|| Linux: MATLAB/Simulink hangs when connecting to shared memory domain
On Linux, a MATLAB script or Simulink model connecting to a Vortex OpenSplice domain via shared memory will hang.
Solution: MATLAB, like Java applications requires that the environment variable LD_PRELOAD be set to reference the active Java installations libjsig.so library. The MATLAB user interface uses Java, and thus requires the same signal handling strategy as Java applications connecting to Vortex OpenSplice. The precise syntax for setting the LD_PRELOAD environment variable will depend on th shell being used. For Oracle JVMs, LD_PRELOAD should contain this value:
|OSPL-9893|| Simulink unbounded string default dimension
A Simulink user needs to be able to specify the maximum dimension of the character arrays created by IDLPP for the DDS 'string' type. Currently IDLPP generates a maximum dimension of 256, which is not user override-able.
Solution: The IDLPP tool has been updated with additional functionality for generating Simulink bus definitions. For unbounded strings (and sequences in general), the IDLPP tool generates a .properties file that can be edited with Simulink specific bounds for these string and sequence types. See the DDSSimulinkUserGuide for more information on the IDL properties file.
|OSPL-9895||IDL types must have unique names before importing into Simulink to create the Simulink Bus.
Currently all IDL types must have unique names so importing IDL into Simulink does not overwrite bus types, since Simulink bus types must have unique names.
The IDLPP tool has been updated with additional functionality for generating Simulink bus definitions. When the tool detects a potential clash in struct names, it generates a .properties file that can be edited with Simulink specific struct name overrides for the bus definition. See the DDSSimulinkUserGuide for more information on the IDL properties file.
|OSPL-9932||Warnings reported in MATLAB when invoking idlImportSl after a fresh MATLAB restart
running idlpp the first time after starting MATLAB causes the following warning. Running the same command again, the warning is gone. >> Vortex.idlImportSl('zz.idl', 'data.sldd') Warning: Cannot use definition of enumerated type 'Color' in dictionary
'/home/prismtech/git/osplo/src/tools/matlab/examples/idl/data.sldd' because it is defined externally.
You must remove the externally defined MCOS class before you can use the dictionary definition.
> In Vortex.idlImportSl (line 14)
Warning: Cannot use definition of enumerated type 'Enum_Day' in dictionary
'/home/prismtech/git/osplo/src/tools/matlab/examples/idl/data.sldd' because it is defined externally.
You must remove the externally defined MCOS class before you can use the dictionary definition. Solution: The Vortex.idlImportSl script has been fixed so as to avoid producing this warning.
|OSPL-9994||Setting a QoS file in Simulink block parameter uses an absolute path to specify the file URI
An absolute path is used to specify the file URI. This means the QoS profile has to be set for each block without the default QoS.
Solution: The Simulink DDS blocks have been updated such that Qos file uri is now set during each block's initFcn instead of at selection time of Qos file name in the mask dialog. Uri is also always relative to matlab's current working directory, which is re-resolved during diagram update/simulation start.
|OSPL-10100||Spliced can crash during termination when thread not alive
When the splice daemon was termination it was possible that it crashed when on of its threads was deadlocked. When it was deadlocked it was not joined and the resources where freed, when the deadlock thread became alive again due to the freeing it could crash because it used freed memory.
Solution: A timed thread join is now always performed when the join fails cleanup is stopped and spliced exits
|OSPL-10137 / 17800|| Error when using multiple participants in single process deployment.
When using single process deployment and creating domainparticipants for the same domain in multiple threads spawned from the main thread an Exception : Error: Failed to create DomainParticipant appears.
Solution: It is now possible to create multiple participants at the same time for the same domain in single process.
|OSPL-10185||The classic C++ leaks memory that is allocated for the default QoS values
The default QoS values in the C++ API are stored in static variables. When the application exits the memory allocated for these default QoS values is not freed. The same occurs for the DomainParticipantFactory.
Solution: The identified leakage has been resolved.
|OSPL-10315||When a service exits some memory is leaked
When a service is started it registers a cleanup handler. When the service exits the associated memory is not freed causing a small memory leak.
Solution: The identified leakage has been resolved.
|OSPL-10324|| Simulink - remove "status" ports from Domain, Topic, Publisher and Subscriber blocks
The status port for the participant, topic, subscriber, and publisher blocks does not offer any value. The status is only used to detect errors - that when they occur, the simulation stops.
Solution: The status port value for these blocks have been made not accessible to the simulation.
|OSPL-10367 / 17969||JVM crash when using multiple definitions of the same Enum value.
When using the Java API and use different idls which have multiple definitions of the same Enum value in the same namespace the JVM crashes where it should give an error report.
Solution: The problem is fixed and when this scenario happens a valid error report will be logged and the JVM will not crash anymore
|TSTTOOL-469 / 17530|| Max Samples Kept Per Reader preference not always respected
In certain use cases, the max samples kept per reader preference does not work. The sample list tab table can have more than the expected number of samples per reader. This can be seen when a reader has a high number of samples coming in, at a fast rate.
Solution: The back end data model was not removing samples from the display list correctly.
Fixed Bugs Not Affecting the API in OpenSplice V6.8.0p3
|OSPL-9497|| When the durability namespace equality check is enabled a merge action may fail when combining alignment requests
When the equality check is enabled for a namespace the durability service will check if the aligner data set is the same as the data set from the alignee when applying a merge policy. However when the aligner tries to combine requests, which is controlled by the RequestCombinePeriod, it incorrectly compares only the data set from the first request which may cause that it concludes that the data sets are the same and no merge of the data sets has to be performed however the other request may have a data set which is not equal.
Solution: When the durability service tries to combine the data requests associated with a namespace it checks if all the associated data sets are the same or not. It may combine requests that either are the same as it's own, thus those sets that do not require an alignment of the data, and it may combine data sets that are not equal to it's own set and which will require an alignment
|OSPL-9553 / 17187|| When a wildcard partition is used for a datawriter data samples may be sent twice.
When a datawriter is created using a publisher which has a partition QosPolicy set to a wildcard partition then depending on the situation (timing) it may occur that the datawriter is attached to the same partition (group) when matching the partition expression with the existing partitions. This will cause that each sample will be written twice. When the datawriter is created it will try to match the partition expression with the existing partitions. However the spliced daemon will also try to match existing datawriters with newly created partitions. This may cause that the datawriter is attached to the same partition (group) twice.
Solution: When matching the datawriter with a partition it is checked if the datawriter is already connected to that partition.
|OSPL-10183||Some memory leaks in the durability and networking service.
When terminating the durability service it may leak the memory associated with the configured policies and merge state. When terminating the networking (ddsi) service may leak the memory allocated which is used to associate the networking service with a topic and partition combination.
Solution: The identified leakage has been resolved.
|OSPL-10200 / 17818||Unable to start a service using a script.
With the introduction of the configuration validator, it is impossible to use a script to start a service.
Solution: When the service executable name isn't one of the defaults (durability, networking etc) but a shell script or other executable, check if the specified file exists and is executable (proper flags set). In that case, also accept that as a valid input.
|OSPL-10201 / 17819||Out of resources when creating and destroying a domain participant.
In a very rare situation during closing of the domain just after opening it, a thread that has exited was not joined due to the fast shutdown after startup. This leads to memory not being freed. After about 32768 of this cases, the system runs out of resources, no new threads can be created anymore.
Solution: Always wait for the thread in case we expect it to have run. If it's running, we will wait for it to exit. If it already has exited, the wait will return immediately, but does free the resources held by the thread.
|OSPL-10272|| Not all ddsi2 threads are able to track CPU progress.
Ddsi2 has the ability to track progress of its thread in terms of the number of CPU cycles spent by these threads. This helps in identifying the root cause of a "Failure to make Progress" notification: if no cycles were spent for a long time a thread might be in a deadlock, otherwise it might be loosing out when competing with higher priority processes/threads for the CPU. However, this ability can currently only be applied to a limited subset of the ddsi2 threads.
Solution: All relevant ddsi2 threads now have the ability to track CPU progress.
|OSPL-10281|| ISOCPP2 API is missing functionality to locate existing participant by its domainId.
The ISOCPP2 API offers a function dds::domain::find() that allows you to pass a domainId and that will return an existing participant attached to the Domain identified by that Id (and null when no match can be found). However, in OpenSplice the implementation for this function was still missing.
Solution: OpenSplice now implements dds::domain::find() function correctly.
|OSPL-10282||DurabilityClient reports error on NO_DATA
DurabilityClient reports an error message when taking data from a built-in topic returns NO_DATA. NO_DATA is not an error and expected behavior in this situation.
Solution: When take of builtin topic for DurabilityClient returns NO_DATA don't treat is as an error.
|OSPL-10289|| Crash when reading sequence of sequence of primitives in ISOCPP2
When using ISOCPP2 and an idl with a sequence of a sequence of a primitive or struct with only primitive fields a Segmentation Fault can occur when a read is done on this sample.
Solution: The problem is fixed and the application will not crash anymore and the sample can now be accessed properly
|OSPL-10318|| Durability may select wrong namespace master when using the same master priority and no persistent store configured.
When the durability service is configured with a namespace with a master priority set and the same or highest master priority is used for each durability instance and when these durability services have no persistent store configured then the master selection may fail. Besides the master priority the quality of the data set is taken into account when determining which durability instance becomes the master for a particular namespace. When no persistent store is configured the initial quality of the namespace is not initialized correctly which causes that each durability instance selects it self as master.
Solution: When determine the master for a namespace in case the master priority is set for that namespace then the initial quality of the namespace is initialized to 0 when no persistent store is configured.
OSPL-9837 / 17627
|dlpp hangs if the IDL file being processed ends with a #pragma line
If the last line in an IDL file is a #pragma, idlpp will consume 100% of cpu cycles and eventually fail.
Solution: Add a newline after the #pragma and save file.
|OSPL-9814||The ddsi service incorrectly uses the reception time to unregister the instances of a terminating remote writer.
When a remote writer terminates the corresponding instance written by that writer are unregisters. The ddsi service will unregister these instances when it is notified that the remote writer has terminated. However it will use the reception time of the notification as the source time for the unregistration of the corresponding instances. This may cause that the unregister is considered out-dated when the nodes are not time-aligned.
Solution: Use the source timestamp that is present in the protocol message that indicates the termination of a writer as the source timestamp for the unregistation of the corresponding instances.
|OSPL-9957||The max_samples and max_samples_per_instance settings of the resource-limits QoS policy should be consistent which means that the max_samples should be greater of equal to the max_samples_per_instance setting. However the creation of an entity or the setting of the QoS of an entity using a resource-limits with inconsistent max_samples and max_samples_per_instance succeeds.
Solution: The consistency of the resource-limits QoS policy is checked when creating an entity or setting the QoS of an entity and respectively return no entity or BAD_PARAMETER.
|OSPL-10011||Durability service with XML persistent store crashes during termination.
When a large amount of data is written to the XML persistent store and the disk drive containing the files is slow, operations like fsync, required to guarantee consistency of the files after i.e. power failure, can take a considerable amount of time. The durability service would incorrectly determine a thread has not made any progress, when it is blocked on disk I/O. When at the same time the service is terminated, it could potentially crash while releasing thread resources.
Solution: The issue is resolved by temporarily disabling thread liveliness monitoring during disk I/O. The same mechanism was already in place for the KV persistent store.
|OSPL-10078||The ddsi garbage collector may block on a full writer history cache (WHC).
When ddsi garbage collector is used to cleanup the administration related to local and remote entities that are being deleted. Under certain circumstances it may occur that the garbage collector is blocked on a writer which still has data available in it's history cache (WHC), e.g. unacknowledged data.
Solution: Allow the garbage collector to continue when it encounters a writer with a full WHC.
|OSPL-10115|| DDSI time+duration implementation assumes signed overflow wraps around.
DDSI's code for adding a duration to a time relies on signed overflow wrapping around, but what happens on signed overflow is actually undefined behaviour in C. Many combinations of platform and compiler do the "expected" thing by wrapping, some provide switches to guarantee this, but in the end the code really should not rely on it.
Solution: Algorithm has been modified and is no longer relies on signed overflow wrapping around.
|OSPL-10116|| A deadlock may occur in ddsi when the LogStackTraces option is enabled
The ddsi service monitors the progress of it's threads. When it notices that a thread fails to make progress the ddsi service can log the stacktrace of the corresponding thread when the LogStackTraces option is enabled (on selected platforms only). To retrieve this stacktrace a special signal and corresponding signal handler is used. However in this signal handler a memory allocation is performed which may cause a deadlock.
Solution: Pre-allocated memory is used by the signal handler to store the stacktrace.
|OSPL-10133|| DurabilityClient accesses freed memory
|OSPL-10135 / 17788||Merge policy "Catchup" may crash the durability service in case of Exclusive ownership.
When a durability service has to apply a "Catchup" policy, and some of the instances it already contains are not covered by the set that is currently being aligned, it must conclude that these instances must have been disposed on the durability master, since otherwise they would have been covered by the set that is currently being aligned. In that case it will create its own implied dispose messages for those missing instances, which may be left partially uninitialized because it is no longer able to establish the exact writerGID, timestamp and inline-qos of the original dispose message that was used to dispose the instance. However, in case of exclusive ownership the missing inline-qos may cause a segmentation violation when the durability service tries to extract the writer strength from it.
Solution: The durability service will no longer try to extract the writer strength from the inline qos of an implied dispose message.
|OSPL-10155||When a coherent writer is used which has a history QoS policy set to KEEP_LAST then it may occur that due to resents a sample in the writer history is overwritten by a new sample. When this occurs the corresponding transaction will not become complete because of the gap that occurs in the sequence numbers that are assigned to the subsequent samples. To prevent this to occur the writer should be created with a KEEP_ALL history QoS policy setting.
Solution: The creation of a group or topic coherent writer is only allowed when the corresponding history QoS policy is set to KEEP_ALL. Otherwise the creation the creation will fail or the enabling of the writer will return PRECONDITION_NOT_MET.
OSPL-10167 / 17806
|Segmentation Fault when using libconfig in combination with OpenSplice and DDSI.
In some cases it is possible that a Segmentation Fault can occur when using DDSI and an OpenSplice application in Single Process mode that is linked with libconfig (-lconfig).
Solution: The problem was that libconfig and DDSI both have similar functions that have a different signature. The DDSI functions have been renamed to avoid this issue.
|OSPL-10184|| Possible Durability service crash during termination when Networking service failed to start.
A few Durability service threads enter an infinite write attempt loop (that only quits when Durability terminates) if the Networking service did not start. Within this loop, the threads are not signalled to be alive. This will force an alternative path when Durability terminates, which can cause a crash.
Solution: Signal the threads to be alive during the write attempt loop.
|OSPL-10236||The durability service over-aggressively requests groups from remote nodes
The durability service needs to exchange group information in order to request data for these groups from remote nodes. To exchange group information, group request messages are initiated, and the remote node responses with group messages. The durability service sends such request for groups way too many times, even in cases when it already knows about the groups of the remote node. Especially in situations where there are many disconnects and many groups, the number of superfluous group exchanges may increase network load and decrease performance.
Solution: The number of superfluous group requests is significantly reduced in case the groups of the remote node are already known.
|OSPL-10238 / 17837||Potential crash in durability service during peer discovery
There is a race condition in the peer discovery of the durability service, where a partially discovered (and thus partially still uninitialized peer) may already be pulled through a matching algorithm and trigger a segmentation violation.
Solution: The matching algorithm has been modified to avoid processing of uninitialized peers.
|OSPL-10250||When there are no writers for a sample, the writer registration for data that is being aligned must be removed.
When there are no writers the writer registration must be removed so that the correct liveliness state is set for the data that is being injected by the durability service. To remove the writer registration and unregister message is generated. However, the unregister message did not fill the key fields correctly, causing the unregister message to be flawed.
Solution: The key fields of the generated unregister message are set correctly.
|OSPL-10252|| A race condition during initialization between the conflict resolver thread and listener threads in the Durability Service can cause a crash in the Durability service or missing events that lead to a permanent incomplete state of kernel groups.
Solution:Staring the conflict resolver thread is delayed until the listeners are initialized.
Fixed bugs and changes not affecting the API in OpenSplice V6.8.0p1
|OSPL-7942 / 38552|
OSPL-9825 / 17621
|Thread specific memory leaking away after thread end
Threads that are created from a user application doesn't use our thread wrapper. Thread specific memory allocated by our software stack isn't freed when these threads exits.
Solution: Use OS supplied callback functionality to call a destructor function when a user created thread exits to free allocated memory.
|OSPL-9631 / 17524|| On vxworks a deadlock may occur when creating an datareader.
On vxworks the implementation for condition variables that are located in shared memory make use of a named binary semaphores. For each thread one named semaphore is allocated to be used to notify the thread when it is waiting on a condition variable. However when a thread starts waiting on a shared condition variable it could register the wrong name (id) with the condition variable. This would cause that the thread was not waken when this condition variable was signalled.
Solution: For each thread a named binary semaphore with a unique name is allocated. The correct name (id) is registered with a condition variable when the thread starts waiting on that condition variable. This allows another thread or RTP to find the correct binary semaphore when signalling the condition variable and awake the thread that is waiting on that condition variable.
|OSPL-9743 / 17579|
OSPL-9775 / 17600
OSPL-9776 / 17599
OSPL-9793 / 17607
OSPL-9847 / 17628
OSPL-9854 / 17633
|Durability service crash when deleting a reader.
The Durability service will obtain historical data from a remote durability service and deliver it to all local datareaders. When it has completed delivery of all historical data, it will signal to all local datareaders that it has completed delivery of all historical data. When a datareader related to the topic for which alignment has completed, is deleted at that specific time, it is possible that the Durability service will crash due to a dangling pointer.
Solution: When notifying completing of the historical data alignment, individual readers are claimed first to ensure they cannot be deleted during the notification process.
|OSPL-9771|| Group coherent transaction purging no longer depended on the GroupCoherentCleanupDelay configuration option.
The GroupCoherentCleanupDelay configuration option should be removed as all information to deduct the delay is available internally without the need to externally configure it.
Solution: Group coherent transaction purging is the removal of transactions which cannot become complete anymore. A transaction cannot become complete anymore when a following transaction is completely received. Only when durability is in the process of alignment it is possible that an older transaction becomes complete, so no purging will be done while aligning. This is a replacement for the GroupCoherentCleanupDelay configuration option which always postponed purging the configured time.
|OSPL-9798|| Improper use of a property in durability namespace administration
One of the properties of a durability namespace is not properly taken into account, when the namespace is received from a fellow durability service. This causes potentially undefined behavior, including the possibility that a merge conflict is not properly resolved for that namespace.
Solution: The namespace admin is now fully copied when a namespace is received by the namespaces listener thread of the durability service
|OSPL-9807 / 17616|
OSPL-9809 / 17614 OSPL-9821
|For namespaces with a non-legacy masterPriority a delay is used in the algorithm to select a master. This delay may prevent alignment taking place.
When durability services discover each other they negotiate which one (the master) will take up the responsibility to align other durability services. Before a master is being elected there is a period where no master exists. When previous generated conflicts are being resolved in this period, they might get dropped because no master is present yet. This may result in alignment not taking place once the master has been elected. Note that the period of not having a master is unnecessary in case it is clear who will become the master.
Solution: The delay to choose a master has been reduced to 0 for namespaces with a masterPriority, so that there is no risk that conflicts are getting dropped because the master is not yet selected.
|Durability service and application deadlock or durability may crash when concurrently deleting a datareader and marking alignment complete
Once the durability service finishes alignment of historical data, it notifies the datareader the process has completed. If that datareader is deleted by the application at the same time, both the durability service and the application deleting the datareader may run into a deadlock due to the fact that deletion of the datareader and marking completeness of historical data alignment take two locks in opposite order.
Solution: The algorithm to mark historical data alignment complete has been refactored to ensure avoid pushing completeness to the readers altogether by keeping track of incomplete groups in a separate structure and letting wait_for_historical_data rely on that instead.
|OSPL-9826|| Out of order delivery of invalid sample might throw away later history
When an invalid sample is delivered out of order to a Reader that has history depth > 1, and newer samples for the same instance are already present, then the invalid sample may legally be dropped. However, it will currently also drop all samples for the same instance that are newer than the invalid sample.
Solution: Only the invalid sample itself will now be dropped in this scenario.
|OSPL-9840|| Possible uninitialized memory reads when using Replace merge policy.
When using a replace merge policy, in some cases a sample could be evaluated right after it had been freed, resulting in uninitialized memory reads.
Solution: The sample evaluation is now always performed before the sample is freed.
|OSPL-9841 / 17625|| The creation of a Query or QueryCondition with a '(single quote) in the expression fails.
When the expression used to create a DDS query of querycondition contains an escaped '(single quote) then the creation of the query fails. Note that a single quote in a query expression should be escaped by another single quote. The cause of this error is the evaluation of the regular expression used by the query expression parser. It appears that the evaluation of the used regular expression interprets the first quote it encounters in the string as the end delimiter of the string instead for matching with the longest possible match.
Solution: The regular expression used by the query parser is changed to accept the occurrences of escaped single quotes. Thus when a query string contains two consecutive single quotes this is converted to one single quote. Note that the documentation still has to be updated that it will be allowed to include a single quote in a query expression by escaping it.
|OSPL-9865 / 17634|| Linking problem when application uses function named os_sleep which is also defined by OpenSplice.
When an application defines a function os_sleep this function may provide a conflict with the internal OpenSplice function os_sleep. This may cause a linking problem when linking the application code with the OpenSplice libraries.
Solution: The name of the os_sleep function has been renamed to os_ospl_sleep.
|OSPL-9875|| Sample requests are not combined with an already combined sample request
To reduce network load during alignment the aligner durability service attempts to combine sample requests from the alignees before it sends its response. The period to combine can be configured using configuration setting //OpenSplice/DurabilityService/Network/Alignment/RequestCombinePeriod. So if two alignee durability services requests the same data during this period, then the aligner combines these requests. If a third request appears during the period then this should also be combined, and the when the period expires all three alignees should be aligned in a single go. Due to a bug the aligner did not recognise the third request as being similar as the first two and hence the third request would not be combined. This leads to multiple responses for the same set and contributes to inefficient alignment.
Solution: The algorithm to combine requests has been fixed so that combined request can also be combined.
|OSPL-9921|| Durability may crash when datareader is deleted shortly after creation
Once the durability service finishes alignment of historical data, it notifies the datareader the process has completed. If that datareader is deleted by the application at the same time, durability may crash. Durability claims the datareader before accessing it and this prevents the datareader from being freed even if the application calls for deletion. Even though the the datareader is not freed while it is still claimed, the datareader is 'disconnected' from the subscriber during deletion. The algorithm to disconnect the datareader from the subscriber does free part of the datareader datastructure in memory and leaves the datareader in a state where that part of the datastructure is still accessible although it is already freed. Once the durability service accesses this already freed datastructure it crashes.
Solution: The algorithm to disconnect the datareader has been modified to ensure the part of the datastructure that is actually freed is no longer accessible after completing the disconnect.
|OSPL-9929||After removing the GroupCoherentCleanupDelay option it was possible that an obsolete group transaction was flushed to the kernel without accessLock
The removal of the GroupCoherentCleanupDelay and its replacement by not being allowed to removal obsolete group transactions during alignment shifted responsibility for the removal of the obsolete group transactions to durability. It was possible that when an obsolete group transaction contained samples that could not be discarded those samples where flushed to the kernel without accessLock, this could lead to late joining readers receive only part of the obsolete group transaction.
Solution: Before flushing undiscardable samples from an obsolete transaction the kernel accessLock is taken
|OSPL-9930|| Potential crash in resendManager
When the configured networking service(s) have not attached to a new combination of a topic and partition yet, then any sample written to this combination are rejected (and hence retransmitted by the resendManager at a later time) until all configured networking services have successfully attached to it. When one of the rejected messages is an Unregister message, then the resend manager may crash while trying to retransmit it.
Solution: The algorithms used to handle message rejection and retransmissions have been modified to correctly handle Unregister messages now, and thus the potential crash will no longer occur.
|OSPL-9958/17656|| Durability not completing capability handshake after asymmetrical disconnection
Before durability services can start alignment they must exchange capabilities. Thus, capability exchange is as handshake before alignment can occur. To detect an asymmetrical disconnect multiple capabilities (>2) from the same fellow durability service must have been received. The only way this could have occurred is when the fellow was disconnected. In that case a durability service should resend its capabilities to the fellow, so that the fellow can participate again in alignment activities. However, the capability was not resend in case of an asymmetrical disconnect, and the fellow would not participate in alignment anymore after an asymmetrical disconnect withe the fellow was detected.
Solution: The capability is resend to the fellow when an asymmetrical disconnect with the fellow has occurred.
|OSPL-9963 / 17660|| WaitSet not triggered when a dispose is followed by an unregister.
When a ReadCondition was added to a WaitSet that should trigger when a sample is disposed, then it was possible that the waitset would not wake up when a dispose was followed by an unregister. This would only happen when the related reader contained only one instance.
Solution: Re-evaluate the attached conditions when a matched invalid sample (a dispose or unregister) is replaced by another invalid sample.
|OSPL-9990|| Multiple asymmetric disconnects in a row could prevent exchange of capabilities and prevent alignment
A precondition for alignment to start is that capabilities must have been exchanged and readers must have been discovered (handshake). If an asymmetric disconnect occurs multiple times after capabilities have been exchanged, then capabilities of the node that got disconnect must be exchanged again once it gets reconnected. This, however, would only occur after the first asymmetric disconnect. When multiple disconnects occur a fellow would not resend its capability because he thinks he has already did it. Furthermore, rediscovery of readers would not occur because the readers the associated DCPSSubscription messages are not available anymore, because they have been taken during the previous iteration. Both the failure to resend capabilities and the failure to rediscover the fellow's readers prevents the handshake to complete after multiple asymmetric disconnects, and hence prevents realignment.
Solution: The capability message is extended with an incarnation number to differentiate between the different incarnations of fellows. In that way fellows can determine to which incarnation a capability message belongs. Furthermore, the DCPSSubscriptions to detect the readers of the fellow are not taken but read, so that they remain available for future incarnations. Both mechanisms are sufficient to determine whether the handshake has completed and realignment can start again.
|OSPL-1004 / 17669|| Crash on cortexa9t.yocto after building custom_lib
When building the product, NDEBUG was not defined while it was when building the custom_lib. This causes binaries mismatch when linking the libraries, which in turn can cause all kinds of problems like crashes. Workaround is removing "-DNDEBUG" from the custom_lib makefiles.
Solution: Define NDEBUG when building the product.
|OSPL-10012|| DDSI sometimes sends a GAP when sample still available
In DDSI, the response to an ACKNACK requesting a retransmission consists of a samples in DATA (and DATAFRAG) submessages, and of GAP submessages to indicate samples that are no longer available in the writer history. The GAP is encoded as a range of sequence numbers [A,B) combined with a bitmap starting at B, where bit k indicates that the sample with sequence number B+k is no longer available (k starting at 0).
In the specific case where the request is for [A,B] and all samples in [ A,B - n ] are present and those in [ B - n+1,B ] are not, and so the GAP message simply contains the interval \ [B - n, B +1 ) and no bitmap, the DDSI2 service attempts to grow the interval by locating the next available sequence number to reduce the number round-trips required between the reader and the writer.
However, because of an off-by-one error, when B+1 exists, and the writer has published more data since, B+1 is erroneously included in the gap. This will (with a rare exception) cause the reader to move forward to at least B+2 and never request a retransmit of sample B. This is caused by the writer starting looking for the next sequence number beyond the gap, but taking B+1 as its starting point instead of B. (The exception is that when packet loss occurs in the retransmit and the gap was not stored in the reorder buffer for lack of space, the reader will not move forward as much and re- request samples, possibly but not necessarily recovering B+1_._)
Solution: The off-by-on error is fixed.
|OSPL-10088|| Wrong .NET version for some windows builds.
In some windows versions of OpenSplice the C# languagebinding was linked against the wrong .NET version. For example the Windows 7 VS2010 build was linked with .NET 4.5 in stead of .NET 4.0.
Solution: The OpenSplice installers do now comply with the following versions.
Windows 10 / VS2015 / .NET framework V4.6
Windows 8.1 / VS2013 / .NET framework V4.5.1
Windows 8 / VS2012 / .NET framework V4.5
Windows 7 / VS2010 / .NET framework V4.0
Windows Vista / VS2008 / .NET framework V2.0
Windows XP SP3 / VS2005 / .NET framework V2.0
|OSPL-10125 / 17790|| Networking service does not halt when binding a socket failed.
It is quite essential for the Networking service to be able to bind to its desired sockets. If not, then communication will fail. The Networking service continued to run when such a socket bind failed, indicating problems only by means of an error trace. The application being none the wiser. The Networking service should halt when it fails to bind sockets.
Solution: Networking service will halt in an error state (using the FailureAction/systemhalt configuration will halt the complete domain in that case) when a socket bind fails.
|TSTTOOL-448 / 17651|
TSTTOOL-449 / 17654
TSTTOOL-452 / 17648
|New shell command simplifies Vortex Tester Python Scripting installation and usage
In order to use Vortex Tester Python Scripting, a user had to download, install and configure a Jython engine, and then at each use, start that engine with specific command line options. Doing this was error prone.
Solution: A new osplscript command is now included with Vortex OpenSplice. To install and configure Python Scripting, the user must only download the Jython installer to the current directory, and run osplscript from a console window. Once configured, osplscript will start the Python Scripting environment; no special command line options are required. Finally, osplscript accepts all standard Python/Jython command line arguments.
|TSTTOOL-456|| Creating a group coherent subscriber in Tester fails to add readers, and throws exceptions in log.
When specifying "Create as group" from the add readers dialog, the data readers fail to initialize and the DDS poller thread throws exception logged in OSPLTEST.log. This is because from OpenSplice 6.8.0, subscribers that have the presentation policy set to access scope = GROUP and coherent access = true are created in the disabled state.
Solution: Tester now prevents operations being invoked on the group coherent subscriber until it is enabled, which is done automatically after the data readers are created. Also added error message to add reader dialog to prevent creating new readers under existing group coherent subscriber.
Fixed Bugs Not affecting the API in OpenSplice 6.8.0
|OSPL-4891||Java and C++ RMI incompatibility.
The RMI bindings for Java and C++ use different topic names. Therefore a Java server and C++ client (or vice-versa) are not compatible with each other.
Solution: A parameter (--RMILegacyTopicNames) has been added to the C++ RMI binding. By default it is enabled to ensure backwards compatibility with previous releases. When disabled, topic names will match the Java binding therefore a C++ RMI client/server will be able to communicate with a Java RMI client/server. For more information please check the RMI Getting Started Guide (section "Runtime Configuration Options").
|OSPL-9183||Java 5 API read/take gives back already processed samples.
In the Java 5 API when doing a read or a take with an own allocated result list it might be possible that already processed data is returned in the result list.
Solution: Only not processed data will be returned
|OSPL-9274||Remove already deprecated MMF durability persistency option
The MMF store for durability has long been deprecated and now has been removed.
Solution: Support for the MMF store has been removed from the durability service after beeing deprecated for a long time. Switch to the new KV-persistency is transparant.
|OSPL-9383||Missing flags in SAC DDS_STATUS_MASK_ANY
DDS_SUBSCRIPTION_MATCHED_STATUS and DDS_PUBLICATION_MATCHED_STATUS are not included.
Solution: The missing flags have been added.
|OSPL-9625||DDSI asynchronous delivery mode behaviour change
The DDSI2 service delivers data to the kernel either synchronously or asynchronously, depending on the latency budget and the transport priority QoS of the writer. If the latency budget is large enough or the transport priority is low enough (both configurable), delivery is asynchronous.
Solution: The behaviour of asynchronous delivery has been changed to not drop data when the delivery queue is full, but rather to wait until there is once again room available in the queue. This behaviour more closely resembles the behaviour with synchronous delivery when a reader has reached its resource limits, but more importantly it significantly increases the stability of the throughput at very high sample rates. The default settings are such that all data is by default delivered synchronously. This change only has an impact if asynchronous delivery has been explicitly enabled and previously samples were dropped because of a full queue.
|OSPL-9626||DDSI now gives allows configuring Heartbeat timing
The DDSI2 service did not allow configuring the timing of the Heartbeats sent by writers to inform the matching readers of the presence of (unacknowledged) data. While the timing parameters were fine for most networks, on long-latency networks they could result in multiple Heartbeats being sent before an acknowledgement could have been received. This is obviously bad for network utilisation.
Solution: The parameters are now configurable in the Internal/HeartbeatInterval setting. The default values are identical to the previously used values, and there is no change in behaviour unless different values are configured.
|OSPL-9627 / 17521||Application crash during write
When inserting a writer sample in the writer history first the position where the sample should be inserted is determined. When the history is full then an older sample is removed from the history. A crash occurs when the sample that has been removed is also the position in the history where the new sample has to be inserted.
Solution: When the insertion point equals the sample that is removed the insertion point should be updated to point to the next sample in the history.
|OSPL-9736||QoS Provider now accepts durability-service policy in DataWriter QoS.
The OMG DDS4CCM Specification defines the QoS Provider API as implemented by OpenSpliceDDS (See QoS Provider documentation). An oversight in the specification resulted in the inclusion of a durability-service policy in the DataWriterQoS. OpenSpliceDDS used to reject QoS profiles that include this policy, since it is not a valid DataWriterQoS policy. However since it is included by the specification (and also in many example xml profiles), this behaviour was deemed inconvenient.
Solution: The OpenSpliceDDS QoS Provider implementation was changed to no longer reject this policy, it is now ignored, reported in the info log but does not prevent creating and using a QoS Provider.
|OSPL-9737 / 17559|| DDSI transmission blocks on high watermark without warning.
When //OpenSplice/DDSI2Service/Internal/ResponsivenessTimeout is infinite (which is the default), then the DDSI transmission will block on the high watermark without a warning trace when there is an unresponsive reader in the system.
Solution: A warning is added that will be traced after a small while when DDSI is waiting on the high watermark.
|OSPL-9747||C99 api function dds_readcondition_create returns null.
Implementation does not correctly copy the mask, causing creation of read condition to fail.
Solution: Fixed the error in the mask copy routine.
|OSPL-9756||C# RoundTrip example reports wrong roundtrip times
The C# example is using the C# DateTime.Ticks property, which is increased each 100nSec. The example was dividing this with 1000 before using it for calculation, loosing resolution and incorrectly assuming this would results in uSec.
Solution: Now, the full 100nSec resolution clocktick is used in the calculation, the conversion to uSec is done during printing of the results by dividing the result by 10.
|OSPL-9910||Shapes Demo cannot set Best Effort DataWriters
The Shapes Demo was only able to set the Reliable policy while assuming that the default would be Best Effort. The Shapes Demo was ported from using isocpp to using isocpp2. The default QoS values of isocpp2 are different from isocpp and follows the standard but that meant that the default is now Reliable (for DataWriter). The result was that the Shapes Demo was not able to change the default Reliable to Best Effort.
Solution: Shapes Demo does now set the various QoS settings explicitly and does not dependent on default QoS values any more.
|Unused variable warning when building application using isocpp2 API
Warning at include/dcps/C++/isocpp2/dds/domain/detail/TDomainParticipantImpl.hpp:45:57
Solution: Removed unused variable.
|OSPL-9930||Potential crash in resendManager
When the configured networking service(s) have not attached to a new combination of a topic and partition yet, then any sample written to this combination are rejected (and hence retransmitted by the resendManager at a later time) until all configured networking services have successfully attached to it. When one of the rejected messages is an Unregister message, then the resend manager may crash while trying to retransmit it.
Solution: The algorithms used to handle message rejection and retransmissions have been modified to correctly handle Unregister messages now, and thus the potential crash will no longer occur.
|OSPL-9950|| When the c99 read/take operation returns no samples then calling return_loan may cause a crash.
When the c99 read/take operation does not return samples and the return_loan operation is called then a crash may occur when the same buffer is used again to read/take samples. The cause is that when no samples are read no buffer is allocated however the not allocated buffer is incorrectly added to the loan administration.
Solution: When the read/take operation returns no samples then the provided buffer should not be added to the loan administration.
|OSPL-9953 / 17653|| Including the DCPS C-API header file in a C++ program may give a compilation warning
An include file that is included by the DCPS C-API header file dds_dcps.h defines the struct os_stat and a function with the name os_stat. When included in a C++ program the name of the struct shadows the name of the function. This may cause a compilation warning.
Solution: The name of the struct is redefined to be different from the name of the corresponding function.
|OSPL-9708||Built-in topics included in all namespaces.
When using RTNetworking, all namespaces would be considered to contain the built in topics. This could cause way too many conflicts as well as improper behaviour on merging the built in topics.
Solution: The built-in topics are only contained in the automatically created namespace that is intended to properly merge built in topics.
|OSPL-9768|| Userclock configuration name and reporting attribute types mixed up in the configurator.
The Userclock configuration name and reporting attribute types are mixed up in the configurator. The name is set as Boolean and the reporting attribute is set as String.
Solution: The defect is fixed and the name is now a String again and the reporting a Boolean.
|OSPL-9769|| Alignment not taking place due to unfortunate timing.
When two nodes see each other, one of them is going to be selected as the master for the other. The alignee node that does not become master waits for the master to update its state before it will acquire the data from the master. It is possible that the master raises its state before the alignee node starts waiting for the update. Effectively, this means that the alignee node will wait for an update that never comes, because it already has occurred.
Solution: The node that does not become master will always request data from its master instead of waiting for a state update.
|OSPL-9785||CATCHUP policy may incorrectly dispose recently arriving data.
When the durability service performs a catchup policy, it intends to replace its current data set with the set from its aligner. Every instance that is in its current data set but not in the aligned data set should be disposed. However, the algorithm used to insert this dispose message might incorrectly apply it to data that has arrived AFTER the request for the catchup but BEFORE the completion of this request, thus accidentally disposing data that should still be considered ALIVE.
Solution: The algorithm used to insert the DISPOSE message has now been modified to apply it only to data that is older than the time of the catchup request.
|OSPL-9208||DDSI not sending an SPDP ping at least every SPDPInterval
DDSI has a Discovery/SPDPInterval setting that is meant to set an upper bound to the SPDP ping interval, that is otherwise derived from the lease duration set in the //OpenSplice/Domain/Lease/ExpiryTime setting. The limiting only occurred when the lease duration is > 10s.
Solution: The limiting has been changed to ensure the interval never becomes larger than what is configured.
|OSPL-8958||DDSI can regurgitate old T-L samples for instances that have already been unregistered
DDSI maintains a writer history cache for providing historical data for transient-local writers and for providing reliability. An instance is removed from this cache when it is unregistered by the writer, but its samples are retained until they have been acknowledged by all (reliable) readers. Already acknolwedged samples that were retained because they were historical data could survive even when the instance was removed. When this happened, a late-joining reader would see some old samples reappear.
Solution: deleting an instance now also removes the already acknowledged samples from the history.
|OSPL-9097||DDSI transmit path can lock up on packet loss to one node while another node has crashed
A successful retransmit to one remote reader while another remote reader that has not yet acknowledged all samples disappears (whether because of a loss of connectivity or a crash), and when all other remote readers have acknowledged all samples, and while the writer has reached the maximum amount of unacknowledged data would cause the transmit path in DDSI to lock up because the writer could then only be unblocked by the receipt of an acknowledgement message that covers a previously unacknowledged sample, which under these circumstances will not come because of the limit on the amount of unacknowledged data.
Solution: deleting a reader now not only drops all unacknowledged data but also clears the retransmit indicator of the writer.
|OSPL-9096||Durability service DIED message even though the durability service is still running
The d_status topic is published periodically by the durability service to inform its fellows of its status. By using a KEEP_ALL policy, the thread writing the status message and renewing the serivce lease could be blocked by a flow-control issue on the network, which could cause the durability service to be considered dead by the splice daemon when in fact there was no problem with the durability service.
Solution: use a KEEP_LAST 1 history QoS policy for the writer.
|OSPL-9067||Large topics are sent published but not received
Loss of the initial transmission of the final fragments of a large sample failed to cause retransmit requests for those fragments until new data was published by the same writer.
Solution: ensure the receiving side will also request retransmission of those fragments based on heartbeats advertising the existence of the sample without giving specifics on the number of fragments.
|OSPL-9077 / 00016820|| Potential crash in durability service during CATCHUP policy
The durability service could crash while processing a CATCHUP event. This crash was caused by the garbage collector purging old instances while the CATCHUP policy was walking through the list of instances to do some bookkeeping.
Solution: The CATCHUP policy now creates a private copy of the instance list while the garbage collector is unable to make a sweep. This private list is then used to do the bookkeeping.
|OSPL-9068 / 00016813||Catchup policy may leak away some instances
When a node that performs a catchup to the master contains an instance that the master has already purged, then the node catching up would need to purge this instance as well. It would need to do this by re-registering the instance, inserting a dispose message and then unregistering this instance again. However, the unregister step was missing, causing the instance to effectively leak away since an instance is only purged by the durability service when it is both disposed AND unregistered.
Solution: The durability will now both dispose AND unregister the instance at the same time.
|OSPL-9081 / 00016824||Potential deadlock in the OpenSplice kernel
The OpenSplice kernel has a potential deadlock where two different code paths may claim locks in the opposite order. The deadlock occurs when one thread is reading/taking the data out of a DataReader while the participant's listener thread is processing the creation of a new group (i.e. a unique partition/topic combination) to which this Reader's Subscriber is also attached.
Solution: The locking algorithm has been modified in such a way that the participant's listener thread no longer requires to hold both locks at the same time.
|OSPL-8956||Temporary blacklisting of remote participants in DDSI2
The DDSI2 service now provides an option to temporarily block rediscovery of proxy participants. Blocking rediscovery gives the remaining processes on the node extra time to clean up. It is strongly advised that applications are written in such a way that they can handle reconnects at any time, but when issues are found, this feature can reduce the symptoms.
Solution: A new setting in the DDSI section of the configuration has been added: Internal/RediscoveryBlacklistDuration along with an attribute Internal/RediscoveryBlacklistDuration [@enforce]. The former sets the duration (by default 10s), the second whether to really wait out the full period (true), or to allow reconnections once DDSI2 has internally completed cleaning up (false, the default). It strongly discouraged to set the duration to less than 1s.
|OSPL-9071||v_groupFlushAction passes a parameter that is not fully initialized.
Valgrind reported that the v_groupFlushAction function passes a parameter that is not fully initialized. Although one of these parameters was evaluated in a subsequent function invocation, it never caused issues because the value was only used as an operand for a logical AND where the other operand was always FALSE.
Solution: All attributes of the parameter in question are now explicitly initialized.
|OSPL-9055||Potential Sample drop during delivery to a local Reader
In some cases, a dispose followed by an unregister does not result in NOT_ALIVE_DISPOSED state on a Reader residing on the same node as the Publisher. In those cases, the Reader has an end state set to NOT_ALIVE_NO_WRITERS, and reports that a sample has been Lost.
Solution: We have no clue what could cause this behaviour, but added some logging to capture the context of the erroneous sample drop. This is just a temporary measure, and will be reverted when the root cause has been found and fixed.
|OSPL-9056||Potential deadlock during early abort of an application
When an application aborts so quickly that the participant's leaseManager thread and its resendManager thread have not yet had the opportunity to get started, then the exit handler will block indefinitely waiting for these threads to exit the kernel. However, both threads are already blocked waiting to access a kernel that is already in lockdown.
Solution: The constructor of the participant will not return before both the leaseManager and resendManager threads have entered the kernel successfully.
|OSPL-8953||Potential deadlock between reader creation and durability notification
A thread that creates a new DataReader and a thread from the durability service that notifies a DataReader when it has completed its historical data alignment grab two of their locks in reverse order, causing a potential deadlock to occur.
Solution: The locking algorithm has been modified so that these two threads do no longer grab both locks in reverse order.
|OSPL-8886||Durability failure to merge data after a short disconnect
When the disconnection period is shorter than twice the heartbeat a durability service may not have been able to determine a new master before the node is reconnected again. In that case no master conflict is generated. In case the durability service is "late" in confirming a master it might even occur that the master has updated its namespace, but the namespace update is discarded because no confirmed master has been selected yet. As a consequence no request will for data will be sent to the master, and the durability service will not be aligned.
Solution: In case a durability service receives a namespace update for a namespace for which no confirmed master is selected yet, the update is rescheduled for evaluation at a later time instead of discarding the update.
|OSPL-8948 / 16755|
|Race condition between durability data injection and garbage collecting of empty instances
The durability service cached instance handles when injecting a historical data set in a way that could result in the historical samples being thrown away if the instance was empty and no known writers had registered it.
Solution: the instance handle is no longer cached.
|OSPL-8971||Catchup policy may incorrectly mark unregistered instances as disposed.
When an instance is unregistered on the master node during a disconnect from another node that has specified a CATCHUP policy with that master, then upon a reconnect that unregister message will still be delivered to that formerly disconnected node. However, the reconnected node will dispose all instances for which it did not receive any valid data, so if the unregister message it the only message received for a particular instance, then its instance will be disposed.
Solution:The Catchup policy is now instructed to dispose instances for which it did not receive any valid data OR for which it did not receive any unregister message.
|OSPL-8984||DDSI handling of non-responsive readers needs improvement
When a writer is blocked for ResponsiveTimeout seconds, DDSI will declare the matching proxy readers that have not yet acknowledged all data "non-responsive" and continue with those readers downgraded to best-effort. This prevents blocking outgoing traffic indefinitely, but at the cost of breaking reliability. For historical reasons it was set to 1s to limit the damage a non-responsive reader could cause, but past improvements to the handling of built-in data in combination with DDSI (such as fully relying on DDSI discovery for deriving built-in topics) mean there is no longer a need to have such an aggressive setting by default.
Solution: The default behaviour has been changed to never declare a reader non-responsive and maintain reliability also when a remote reader is not able to make progress. The changes also eliminate some spurious warning and error messages in the log files that could occur with a longer timeout.
Version 6.6.3p4 introduced a fix for OSPL-8872, taking the sequence number most recently transmitted by a writer when it matched reader into account to force heartbeats out until all historical data has been acknowledged by the reader. The change also allowed a flag forcing the transmission of heartbeats informing readers of the availability of data to be set earlier than before in the case where the writer had not published anything yet at the time the reader was discovered. While logically correct, this broke the determination of the unique reader that had not yet acknowledged all data in cases where there is such a unique reader. This in turn could lead to a crash.
Solution: the aforementioned flag is once again never set before a sample has been acknowledged.
|OSPL-8974||Durability conflict scheduling fails when multiple namespaces have the same policy and differ only in topic names
Durability checks for conflicts between fellows (master, native and foreign state) that may require merging data whenever it receives a "d_nameSpaces" instance. If a conflict is detected, it enqueues it for eventual resolution, but only if an equivalent conflict is not yet enqueued. Testing for equivalency is done by checking: conflict kind, roles and local and fellow namespaces. However, the name space compare function (d_nameSpaceCompare) did not take the name into account, nor the full partition+topic expressions. The consequence is that when namespaces A and B have identical policies and differ only in the topic parts of the partition/topic expressions, a conflict for namespace A would be considered the same as a conflict for namespace B. The result would be a failure to merge data in B.
Solution: The comparison now takes the name of the namespace into account. The configuration is required to have no overlap between namespaces and to have compatible namespace definitions throughout the system. The name alone is therefore sufficient.
|OSPL-8973||Additional durability tracing when verbosity is set to FINEST
Durability has been extended with additional tracing in the processing of namespace definitions received from fellows, in particular when checking for master conflicts.
|OSPL-6112||An asymmetrical disconnect may lead to an inconsistent data state.
The durability service relies on a reliable and symmetrical network topology. Every once in a while it is possible to experience temporary network hickups resulting in temporary asymmetrical network topology (durability service A sees B, but B does not see A). This can typically occur due to high load on one of the machines, or a networking problem. Such asymmetrical disconnects may lead to an inconsistent data state. To recover from such situation the durability service must recognize when such asymmetric disconnect occurs, and trigger the alignment actions to make the data state consistent again.
Solution: Asymmetric disconnect situations are detected and the correct alignment actions are triggered to recover from the inconsistent data state.
|OSPL-9136||Protobuf isocpp2 example fails to build on E500mc build
The isocpp2 protobuf example fails to build on E500mc due to the fact that the wrong compiler is referenced in the example build script.
Solution: The build script has been modified to reference the correct compiler.
|OSPL-9267 / 16945|
OSPL-9270 / 44079
|Coherent set create, delete, unregister, recreate causes the recreate to be lost
When an unregister message was received from a coherent writer the connection (pipeline) between the group instance and reader instance was always destroyed immediately while the registration for the group instance was not removed immediately. The destruction of the pipeline with a still valid registration could cause samples written after the unregister to be dropped.
Solution: On unregistration from a coherent writer immediately destroy the pipeline and process the unregistration for the group instance, like done for non-coherent writers.
|OSPL-9299|| Durability shall allow configuring a master priority
Currently it is almost impossible to control which durability service becomes master. This leads to situations that durability instances that should not become master can become master. Furthermore, it is currently not possible to indicate that a durability instance can act as aligner, but should not take up the responsibility to act as master for other instances.
To provide more control which node becomes master for a namespace, the //OpenSplice/DurabilityService/NameSpaces/Policy[@masterPriority] attribute can be specified. This is an optional attribute that specifies a value between 0 and 255. Value 0 means that a node will never become master, and value 255 indicates that the legacy master selection will be used. The default is 255. When the masterPriority is specified between 1 and 254, nodes with a higher masterPriority should become master. See the deployment manual for more information.Note that there are currently a few limitations:
Mastership handover in case a better but late joining master arrives is currently unsupported.
In case of mixed deployments with legacy nodes, the nodes that support masterPriority must use masterPriority=255.
Solution: A masterPriority has been implemented on the namespace policy that specifies the eagerness of a durability service to become master for a namespace. This gives a user more control. Note that mastership handover in case a better but late joining master arrives is currently unsupported. This will be addressed in the next release (see OSPL-9358 in known issues list).
|OSPL-9329||Multiple concurrent topology changes may cause durability to switch to majority voting
The current implementation of the master selection algorithm in durability falls back to majority voting in case no agreement for a master can be established in a limited number (4) of election rounds. When various topology changes occur concurrently then it is likely that no agreement is reached in the maximum number of rounds. The master selection algorithm then falls back to majority voting. Because different durability services may have a different view of their world, majority voting may lead to different masters in the system and an inconsistent end state for the data on various nodes.
Solution: To decrease the likelihood that the master selection algorithm falls back to majority voting we have increased the maximum number of rounds to 50 before the fallback will occur. This may lead to a longer master selection phase in case multiple topology changes are occurring concurrently, but increases the possibility that the end state is consistent.
|OSPL-9333 / 17011||DDSI2 private memory growth when unregistering T-L instances/deleting entities without any peers present
The DDSI2 service was holding on to the unregister messages for transient-local instances inside the writer history while disconnected from the rest of the world. When another node showed up, it would receive and acknowledge all these messages, but without significant effect on the receiver as they only described the unregistering of unknown instances/entities. Reception of an acknowledgement of a new T-L message or creation of a new entity sent while another node is connected would clear out these unregisters.
Solution: When no peer exists, do not retain the unregister messages in the writer history cache.
|OSPL-9335 / 17009||Spliced not inheriting priority from "ospl start"
TThe "ospl start" command always started spliced with a default priority in a default scheduling class (timeshare on typical platforms), independent of the actual priority of the "ospl" tool itself. This is contrary to what one would expect, and moreover made it impossible to force the threads that are not independently configurable to run at a certain priority.
Solution: The behaviour of "ospl start" has been changed so that "spliced" now inherits its priority.
|OSPL-9365||The durability service can hang when requesting data from multiple fellows and some of them leave after having provided a partial set
When the durability service determines it needs to request samples from multiple fellows to align with them (typically after becoming a master), it greedily dedups the incoming samples to reduce memory requirements before applying the configured merge policy. When a fellow disappears after providing some but not all samples in the set, the ones already received need to be removed from the received set. The administration of the received, dedup'd set was lacking the information needed to do this correctly. This then potentially resulted in the durability service waiting forever for the set to become complete.
Solution: The durability service now annotates each sample with the set of fellows that have provided it.
|OSPL-9388 / 17030||Durability service might deadlock when the networking queue is flooded.
When the network queue is overrun by the durability service, the normal mode of operation is to sleep a bit and retry again later. However, there is a slight chance that the sending thread of the network service that needs to make room again by consuming elements in the queue will indirectly block on the sleeping thread in the durability service itself.
Solution: The network service can no longer indirectly run into a lock that is held by the durability service while the network queue is flooded.
|OSPL-9408||Memory is leaked when readers request historical data.
If a reader requests historical data, some internal administration is created. A small part of this is not freed, resulting in a memory leak. In case readers are created and destroyed in a loop, the total amount of memory that is lost can become significant.
Solution: The identified leakage areas have been fixed.
|OSPL-8017||DDSI2 did not renew a participant lease for every received message
The DDSI2 service discovers remote participants and automatically deletes them if they do not renew their leases in time. The lease renewal was tied to reception of data and of explicit lease renewal messages, and hence reception of, e.g., an acknowledgement would not lead to a lease renewal, even though it obviously requires the remote participant to be alive.
Solution: DDSI2 now renews leases regardless of the type of message
|OSPL-9485|| Durability may mark aligned samples as duplicates while they are not and drop them
Durability combines incoming samples into a set while filtering out duplicates, but fails to take into account all fields that determine its uniqueness. This can lead to dropping samples from the set that would have affected the state of the system or result in durability never considering the set complete.
Solution: The function to compare samples to determine uniqueness has been modified to ensure all relevant attributes are taken into account.
|OSPL-9501|| Durability service may wait forever on asymmetric disconnect with a fellow
When a fellow is asymmetrically disconnected while samples are being requested it is possible that one or more of these sample requests are not received by the fellow, or that the response is lost. The durability service that sends the sample requests will be waiting for answers until all answers are received. But since the fellow was asymmetrically disconnected and requests or responses may have been lost, the durability service may wait forever for answers that will never come, thereby blocking progress of the durability service.
Solution: When a fellow is asymmetrically disconnected all pending sample requests are discarded. In that case the durability service will not wait for answers anymore and continue operation. Reappearance of the fellow may lead to new alignment actions that are handled consecutively.
|OSPL-9502||Durability not always syncing with new master
Durability services negotiate a master to align data from. It has been observed that a deceased fellow got chosen as master. It is evident that chosing a deceased fellow as master will not lead to alignment and may lead to inconsistent data state.
Solution: The algorithm that caused the deceased master to be chosen has been modified to prevent this situation from happening.
|OSPL-9503||Durability may wait for responses from already disconnected fellow
To keep a consistent state durability services may request data from each other. When a durability service is waiting for answers to such requests from a fellow and that fellow 'leaves', the durability service cleans up the pending requests correctly. However, when the fellow leaves while the durability service is concurrently sending out the requests, it is possible that not all requests are cleaned up correctly. In that case the durability service will keep waiting for data from the fellow that has left forever. This stalls the alignment process until the service is completely restarted.
Solution: Slight changes in the locking strategy prevent concurrent sending of requests and cleaning them up on fellow disconnection
|OSPL-9514||Memory leak in core
Each time a DataReader is created a small amount of memory is leaked due to a missing free.
Solution: The missing free has been added.
|OSPL-9522||DDSI2 transmit path can lock up when two partially asymmetric disconnections overlap
Consider a situation where: on node A the lease of B expires, and then on node B the lease from A expires while A rediscovers B, and then B rediscovers A. Then imagine that just before the lease of A expires on B, a heartbeat by a transient-local or endpoint discovery writer on A is multicasted and received, processed and responded to by B, ACK'ing everything. Then, in the particular case where A receives that acknowledgement just after rediscovering B, A will assume that B has received and ACK'd everything, and not send heartbeats out on B's behalf.
In this situation, B will still send pre-emptive ACKs until it receives a heartbeat from A; this will trigger a retransmit of the missing data. If some of this data is lost on the network, and no other readers exist to force A to multicast heartbeats, then the DDSI specification requires B to wait with requesting a further retransmit until a heartbeat is received. (The most likely reason for this to happen is writing new data on A.)
In the particular case where the lost data concerns endpoint discovery data, full connectivity will then not be established properly. If B has missed out on the definition of a data writer, then B will not acknowledge any data writer by that writer as it doesn't know the writer, which may cause the transmit buffers on A to fill up, and hence may lock up the DDSI transmit path on A until the reader on B is declared non-responsive.
Solution: DDSI2 will now automatically re-request a retransmit after a configurable amount of time, by default 1s, but this may be disabled for strict compliance with the specification. Under normal circumstances, the retransmits will arrive much sooner, and so no additional network traffic will typically be generated by this change.
|OSPL-9533||Durability with master priority 0 not retrieving data after master reconnect
The durability service clears the merge-state and master of its local namespaces when it is not an aligner (or has taken a mastership poison pill) and a fellow is removed with the same role even if that removed fellow was not the master of the name-space. Effectively this means the master is cleared even though it is still connected. Secondly, the durability service does not request the data from the new master if its master priority = 0.
Solution: Only reset the master and clear the state of a nameSpace if the removed fellow actually is the master of the nameSpace. Additionally, durability with master priority 0 now requests data from master after resolving a master conflict.
|OSPL-9433 / 17161||Purging instances while merging hasn't finished may result in wrong end-state of an instance
When an instance is purged because it reached its end-state while merging, the merge may reintroduce the instance falsely. This can for example happen when an instance is contained in the merge, but while the merge is ongoing, the instance is disposed and unregistered. In that case the instance can disappear. When the merge is finished, there is no way for the middleware to tell whether the instance contained in the merge isn't valid anymore.
Solution: While there is a merge or regular alignment action ongoing, purging of instances that have reached their end-state is deferred. There is a known limitation to the current implementation w.r.t. the use of RTNetworking. Furthermore, there is a small time-frame in which this suppression may not be effectuated correctly if a node is (re)connecting at the end of merging. This will be fixed in an upcoming release and is covered in OSPL-9612.
|OSPL-9542||Raising the namespace state for a namespace while there are still pending conflicts for the namespace may cause scalability issues.
Whenever nodes (re)connect the durability service has a potential inconsistent state. To solve these the durability service will generate conflicts and handles them one by. When a conflict has been handled the durability service currently can raise the state of one or more namespaces for which it is master. Raising the state causes the slaves to acquire data. Currently, the state is potentially raised after each conflict. In case there a pending conflicts for the same namespace, raising the state is is not very smart because is causes slaves to request data multiple times. A better approach would be to raise the state after the last conflict for the namespace has been handled. This would cause slaves to requests the data only once instead of multiple times. Doing this will improve scalability.
Solution: The state of a namespace is now only raised when there are no more pending conflicts.
|OSPL-9561||When unregistration creates implicit registration the unregistration is not processed
When an unregistration message is the first message received it is used to create a registration message. This registration message has the same write time, gid and sequence number as the unregistration and even though the state differed it was dropped as duplicate.
Solution: When comparing messages the state is now also used to determine if its a different message.
|OSPL-9566||Possible durability crash when injecting transactions when for one of the writers the DCPSTopic is not yet received
When injecting an EOT message via durability a crash could happen when the list of writers available in the EOT was evaluated and for one of the writers in the list the DCPSTopic was not yet received while the DCPSPublication was. The combination of DCPSPublication and DCPSTopic was used for discovery of the writer and the implementation assumed the it always received the DCPSTopic before the DCPSPublication, this assumption would lead to the crash.
Solution: When discovering the writers in the EOT now only the DCPSPublication is used.
|OSPL-9593||Overflow of internal metadata reference counts
Loading the same topic type into OpenSplice in a loop could overflow the reference counts of some internal metadata objects. This in turn could lead to freeing important metadata, eventually crashing the application
Solution: The reference counts are now forced to a maximum value if they would otherwise overflow.
|OSPL-9598||Deletion of DomainParticipant in ISOC++v2 API may trigger invalid handle detected error report
When an ISOCPP2 application creates 2 participants, and closes the 2nd participant, the info log will contain a message like this: "invalid handle detected: result = U_RESULT_ALREADY_DELETED, Entity = 0x7ffed4c045f0 (kind = U_LISTENER)". This was caused by the fact that the explicit deletion of the Listener in question was postponed til AFTER the deletion of its participant, which already implicitly deletes it as well. So the Listener effectively got deleted twice.
Solution: The listener in question is lo longer deleted explicitly, but its deletion is left to the participant which will eventually delete it implicitly. This way the listener can never be deleted twice.
|OSPL-9621||When all data for a partition/topic combination has become complete, this new state is not always advertised
The durability service is responsible for keeping durable data sets consistent. When a durability service has retrieved all data for a particular partition/topic combination, the data set is marked is complete. In some cases the event that the data is complete is not advertised. This may stall other durability services that wait for the data set to become complete.
Solution: When the set of data for a partition/topic combination has become complete, the completeness is advertised. This may trigger alignment from remote nodes.
OSPL-8970 / 16949
OSPL-9105 / 16841
|Semantics for instance state not clearly defined after reconnect.
The effects of a disconnect on the instance state of your data is clearly defined: all data originating from the disconnected node(s) becomes NOT_ALIVE_DISPOSED in case it was written with a writer using auto_dispose_unregistered_instances=TRUE or NOT_ALIVE_NO_WRITERS in case it was written with a writer using auto_dispose_unregistered_instances=FALSE. However, the effect of a reconnect to the previously disconnected node on these instance states was not clearly defined.
Solution: We have now specified the following behavior for a previously disconnected instance:
In case of VOLATILE data, the instance state remains the way it was (NOT_ALIVE_DISPOSED or NOT_ALIVE_NO_WRITERS) until an new update for that instance arrives, that will set the instance state back to ALIVE and the view state to NEW.
In case of TRANSIENT/PERSISTENT data, previously consumed samples may re-appear due to re-alignment to the disconnected node. However, these re-aligned samples will appear as NOT_READ, and will set the instance state back to ALIVE and the view state to NEW. Of course, any previously unseen samples will so the same thing.
Although it might seem weird that previously consumed samples re-appear as NOT_READ, this actually allows you to use standard mechanisms (e.g. a read/take with sample state NOT_READ) to filter out all instances whose instance state has been impacted by the reconnect. (a read/take with sample state READ would just have returned all instances, also the ones whose state did not get modified as a consequence of the disconnect/reconnect cycle).
|OSPL-9069 / 16818||DDSI2 reports an inscrutable error when presented with a topic definition it doesn't support
DDSI2 reported errors such as "handleDataReader: new_reader: error -1" for topics it doesn't support. This typically means the length of the serialised key exceeds 32 bytes (counting strings as 4 bytes), but this was not in any way expressed by the error message.
Solution: The error messages now properly identify the problem, including topic and type names.
|OSPL-9111||In case client durability is configured no historical data is retrieved when a non-volatile reader is enabled.
The reference manuals indicate that when a non-volatile reader is enabled, a request for historical data should be send out. In case client durability is configured this request was not sent out. Consequently, no historical data is retrieved when the reader becomes enabled. Only after an explicit call to wait_for_historical_data historical data will be retrieved.
Solution: When a non-volatile reader becomes enabled a request for historical data is being send out, causing historical data to be retrieved.
|OSPL-9243 / 16947||Instances may keep track of unlimited amounts of invalid samples.
When an event occurs on a reader instance (like a dispose or unregister) an invalid sample carrying the event context is inserted in the history of this reader instance. These invalid samples may eventually be pushed or taken out of the history, but there are scenario's in which neither will occur (for example when I repeatedly disconnect/reconnect to the producer of the instance without any new data being added in the process while I do not actively take the resulting invalid samples out of the reader). In such cases the Reader might collect an unlimited amount of invalid samples claiming more and more resources in the process without any restrictions.
Solution: Each instance can now only have 1 invalid sample at most and an invalid sample can only be located at the tail of the instance. Newer invalid samples will simply replace older invalid samples. However, even a removed invalid sample will still cause an increment in the generation counts of the samples following it.
|OSPL-9358|| Mastership handover.
Until now late joining nodes always slave to an existing master. With the introduction of master priorities (see //OpenSplice/DurabilityService/NameSpaces/Policy[@masterPriority]) it is possible to assign a mastership preference to nodes. In case a late joining node has a higher preference than the existing master, the late joining node should become master instead of slaving to the existing node. This requires handover of mastership.
Solution: Late joining nodes with a higher mastership preference than the current master now trigger a merge so that handover of mastership is established.
|OSPL-9437||When using a topic with history depth (default) it is possible that a group coherent transaction never becomes complete when live data and alignment data are received simultaneously.
When the durability service aligns a group coherent transaction for which samples have been pushed out of the history due to history limits and part of that same transaction is received via live communication it is possible that the transaction never becomes complete on the federation that's being aligned. When the aligned federation received the EOT via live communication and a special EOT via alignment, the internal EOT counter gets an invalid value which results in the group never being marked as complete.
Solution: Reception of EOT via live communication and alignment is now handled correctly.
|OSPL-9478||Reader not getting locally stored transactional historical data when last message was an unregistration.
When creating a reader on a federation that has open historical transactions and the last message in that transaction is an unregistration the historical open transaction is not injected in the reader as the injection tries to use a non-existing pipeline.
Solution: No longer use the pipeline for injecting locally stored transactional historical data into the reader.
|OSPL-9486||Complete group coherent set not flushed when completed by publicationInfo notification on the reader
When a publicationInfo notification on the reader completes a group coherent set the set was not flushed when it became complete. In case no transactions would follow the readers or subscriber would never be notified. After calling begin_access the set was available for the readers.
Solution: PublicitionInfo notification on the reader now flushes the set to the readers
|OSPL-9534||The reader operation take_next_instance might skip instances with invalid samples.
The reader operation take_next_instance is expected to return the next matching instance starting from the instance identified by the specified instance handle. However, in case the next matching instance contained only an invalid sample, instead of returning this invalid sample the instance would be skipped and the next matching instance would be returned instead.
Solution: The algorithm to identify the next matching instance has been corrected to no longer skip an instance that has only got an invalid sample.
|OSPL-9562||Fake heartbeat unregistered while never written
The splice daemon is responsible for writing fake heartbeats on receiving a DCPSParticipant form a remote federation, the writing of these heartbeats happens conditionally based on existing heartbeats. The splice daemon unconditionally unregisters its fake heartbeat. From version V6.5.0p5 and up DDSI and RTNetworking services write a heartbeat before forwarding any samples so in that case the splice daemon will not write a fake heartbeat but does unregister the fake heartbeat. The unregistering of the fake heartbeat lead to an invalid liveliness state (to high). Which could cause missing of disconnects on application level
Solution: Unregistering of the fake heartbeat is now also conditional
|OSPL-9652||When an asymmetric disconnect occurs while there are pending sample requests, alignment may not progress.
Asymmetric disconnects may occur at any time. In particular, it is possible that an asymmetric disconnect occurs while a durability service has outstanding sample requests to another durability service. In case a durability service (say A) has lost connection with another durability service (say B) but not vice versa, and B is waiting for answers to outstanding sample requests from A, then B might not receive an answer because A has lost connection to B. This may stall alignment. In such situation B must cancel the pending sample requests to A, and re-initiate alignment as soon as bidirectional communication is established again.
Solution: When an asymmetrical disconnect is detected pending sample requests to the asymmetrically disconnected node are cancelled.
|OSPL-9676|| Creating (enabling) a data writer in a non-coherent publisher between begin/end coherent set sets transactionId on writer
Creating (enabling) a data writer in a non-coherent publishe between begin/end coherent set copies the next sequence number (1) to the transactionId on the writer, which then causes all published data to be treated as-if part of a transaction that will never be committed. The data will be delivered because the only matching readers are non-coherent, but transient data will be retained in the transaction administration in the group.
Solution: A transactionId is no longer set when creating a writer in a non-coherent publisher.
|OSPL-9677||he durability service does not parse the //OpenSplice/Domain/GeneralWatchDog and //OpenSplice/DurabilityService/Watchdog configuration items correctly.
Watchdog threads are used to track progress of services. Users can specify the properties of the watchdog thread using the generic //OpenSplice/Domain/GeneralWatchDog setting. Individual service like the durability service service can override these values by specifying //OpenSplice/DurabilityService/Watchdog. The durability service did not parse these settings correctly, causing the specified values to be non-effective.
Solution: The parsing has been changed so that changes to these settings now take effect.
|OSPL-9701||The auto_dispose setting of a DataWriter is not always processed atomically.
When the DataWriter uses and auto_dispose_unregistered_instances setting of TRUE, the reader side would not expect to see an instance state of this Writer to ever become NO_WRITERS. However, because the auto_dispose setting is not always handled atomically, in certain scenario's we might end up with a reader instance whose instance state is NO_WRITERS nevertheless. An example of such a scenario is when an autodisposing DataWriter explicitly disposes and then unregisters an instance. If a late joining node requests alignment between the dispose and the unregister, then the unregister message might be received first through the normal network connection, while the older dispose message arrives later as part of the alignment process. That means the instance state will first go to NO_WRITERS, and the older DISPOSE message can no longer undo this.
Solution: An unregister message from an autodisposing writer will now always be treated as a combined DISPOSE and UNREGISTER message. That means that is the unregister arives before the explicit dispose, it will still set the instance to DISPOSE and not to NO_WRITERS.
Fixed bugs and changes affecting the API in OpenSplice 6.8
| Report ID.|| Description
| OSPL-9914|| Removal of obsolete ISOCPP (version 1) from examples and documentation
ISOCPP version 1 is deprecated and replaced by ISOCPP version 2.
Examples and documentation were still present in OpenSplice causing
new users to possible make the mistake to start using the wrong ISOCPP version.
Solution: The ISOCPP version 1 examples and documentation are
now removed forcing new users to use ISOCPP version 2.
| OSPL-9763 / 17591|| IsoCpp2 memory leak when deleting DomainParticipants.
A small listener object was not deleted when the related IsoCpp2 DomainParticipant was deleted.
Solution: The deletion of the DomainParticipant and the related listener object and thread has been re-factored.
| OSPL-9187 / 16857|
| Potential memory leak for unaccessed group coherent subscribers
In order to correctly deliver coherent sets to group coherent
subscribers with late joining readers, the middleware maintained the
transactions that weren't yet accessed. When accessing the
subscriber, the transactions were flushed. This was possible because
between begin- and end-access no readers can be added to a
subscriber. This however introduced a requirement to access the
subscriber periodically in order to be able to reclaim the memory.
Solution: A restriction has been put on when datareaders can be
created for a group-coherent subscriber. A group-coherent
subscriber is always created disabled, regardless of the
EntityFactory QoS on the domainparticipant. Readers can only be
added to the subscriber for as long as it is not enabled. After
the subscriber has been enabled by calling enable on it, mutations
to the subscriber like adding a reader to it or changing the QoS
(even ones that are normally mutable) on the subscriber or its
readers are not allowed anymore. Readers can still be removed from
the subscriber. If the subscriber is enabled, all its contained
readers will be enabled too, regardless of the EntityFactory QoS
of the subscriber. This change thus requires the addition of an
explicit invocation of enable on the group coherent subscriber.
You can find more information on the fixed bugs and changes in the other OpenSplice V6.x releases on these pages:
- Fixed bugs and changes in OpenSplice 6.8.x
- Fixed bugs and changes in OpenSplice 6.7.x
- Fixed bugs and changes in OpenSplice 6.6.x
- Fixed bugs and changes in OpenSplice 6.5.x
- Fixed bugs and changes in OpenSplice 6.4.x
- Fixed bugs and changes in OpenSplice 6.3.x
- Fixed bugs and changes in OpenSplice 6.2.x
- Fixed bugs and changes in OpenSplice 6.1.x