Roadmap

I'm planning a major cleanup of the public api before making a first official release. Because some of those changes require breaking backwards compatibility, I'm going to take this opportunity to introduce many changes that have been on my todo list for a long while, but have been postponed forever to avoid breaking backwards compatibility as much as possible.

The remainder of this (long) email is an overview of the planned changes. There are a couple of questions as well.

Feature Status
Decide on a version scheme done
Separate public and private headers done
Improved error codes done
Namespacing the public api High level interface done.
Improved naming consistency Done, except for the self-pointer.
Cleanup the parser api todo
Re-order structures to eliminate padding todo
Drop deprecated functionality todo
Improved tools todo
Stable numbers for the backend types (and merging device and parser) done
Return dc_buffer_t pointer instead of integer todo
Units: hydrostatic vs density todo
Documentation todo
Enumeration of supported models done
Generic dc_device_open function. Done, except for the communication specific parameters.
Generic dc_parser_create function. done
A context object for global state done
Timezone support todo
Support partial read/write. todo
Units todo
Improved naming consistency: open/create/new? done
Add a destructor for the dive data? todo
Replace the parser interface with a dive interface. todo

1. Decide on a version scheme

The current plan is to use a typical major.minor.micro scheme. The infrastructure to support this scheme is already in place for a long time, but it's just not being used.

The main question is then how to increment these numbers? Either based on features or binary compatibility? I have a personal preference for the last method. It might also be a good idea to be able to distinguish between release and development versions. For example using a "-dev" suffix.

Worth reading: http://apr.apache.org/versioning.html

2. Separate public and private headers

All public header files are moved to the include/libdivecomputer subdirectory. The main benefit is that the example applications can use the recommended include style (e.g. #include ), regardless of whether they are build inside the build tree, or standalone. And as a bonus, it becomes very clear which header is considered public or private.

3. Improved error codes

The device and parser error codes will be merged to a single set. At the same time a few extra error codes are introduced:

* DC_STATUS_SUCCESS     /* Success */
* DC_STATUS_UNSUPPORTED /* Unsupported operation */
* DC_STATUS_INVALIDARGS /* Invalid arguments */
* DC_STATUS_NOMEMORY    /* Out of memory */
* DC_STATUS_NODEVICE    /* Device not found */
* DC_STATUS_NOACCESS    /* Access denied */
* DC_STATUS_IO          /* Input/output error */
* DC_STATUS_TIMEOUT     /* Timeout */
* DC_STATUS_PROTOCOL    /* Protocol error */
* DC_STATUS_DATAFORMAT  /* Data format error */
* DC_STATUS_CANCELLED   /* Operation cancelled */

These are roughly identical to the current error codes, with a few modifications. The old TYPE_MISMATCH code is changed into the more general INVALIDARGS. The old ERROR is changed into the new DATAFORMAT or INVALIDARGS, depending on the context. The NODEVICE and NOACCESS codes have been added to return more fine grained error information when opening a connection to the low-level hardware (e.g. serial/irda/usb). Note that these two new error codes are not intended to report the status of the communication with the actual dive computer!

Are there any error codes missing?

4. Namespacing the public api

All public API (functions, structs, enums, etc) should be prefixed with "dc_" (which is short for libdivecomputer of course) to reduce the chances of conflicts with other libraries.

In recent additions this prefix is already used (e.g. dc_datetime_t and dc_buffer_t), but in older code it needs to be added. Note that this will break backwards compatibility everywhere!

5. Improved naming consistency

In a few places the naming of the typedef's is inconsistent. For example there is a callback named device_event_callback_t and dive_callback_t (missing "device" in its name), an enum named device_type_t and device_event_t (missing "type" in its name).

I intend to introduce a more consistent naming scheme:

dc_event_type_t;      /* An enum listing all possible types. */
dc_event_progress_t;  /* A struct corresponding to one specific type. 
*/
dc_event_devinfo_t;
dc_event_clock_t;
dc_event_callback_t;  /* A callback function. */

Related to this issue is the question whether the callback function should return the object pointer for which it is called or not (e.g. the this or self pointer)? Currently the event callback has this pointer (because you need this pointer for the fingerprint feature), while all others don't. I want to make this consistent by either adding it to all callbacks or removing it. Technically, you can always pass the object pointer by means of the userdata parameter, but it can be convenient to have it for free (at the expense of needing an extra parameter in the callback function signature).

Does anyone have a particular preference regarding adding/removing this object pointer?

6. Cleanup the parser api

The parser api needs some major improvement.

The biggest issue here is that the parser_sample_value_t union is passed by value in the sample callback. This means we can't introduce new sample types in the future without breaking backwards compatibility (because extending the union may change its size). This can be avoided by breaking the union into individual structs and pass them by reference with a pointer. This is similar to the current event callback, so it will improve consistency too.

The downside is the application will have to cast the pointer to the correct data type manually. It might be possible to keep the union (but still passed as a pointer of course), to eliminate the need for a manual cast, but this might not be entirely portable. I'll need to investigate this further.

Another concern are the sample events. Currently we have a large list with all possible event types supported by the many dive computers. But this starts to become a big mess because the data associated with each type varies greatly between different dive computers, so it's nearly impossible to map this into a single data structure. Therefore, I propose the remove the event sample type, and use the vendor type for this purpose. Then we can just document the format and pass the complete data to the application.

Note that in a later stage, we can always add back the more general purpose events (e.g. gas switches, etc), but then with an appropriate data structure for each event and not the single catch-all data structure we have now. This will need further discussion.

7. Re-order structures to eliminate padding

A few structures have unnecessary padding bytes inserted by the compiler due to alignment requirements (e.g. device_clock_t). These useless padding bytes can easily eliminated by re-ordering the fields in the structure. This breaks backwards compatibility, but since some of the other changes will break compatibility anyway, it's a good opportunity to get rid of the padding too.

8. Drop deprecated functionality

There are a few functions that have a better alternative, and can be removed.

For example the *_set_timestamp() functions in the reefnet and uwatec backends can be replaced with the more general device_set_fingerprint() function. You can still use an integer timestamp with the correct serialization to the binary fingerprint.

Another candidate for removal are the *_set_maxretries() functions. I think it's better to just handle this completely internally. In the end, the backend should know best how many attempts should be made to retry a failed command.

The device_version() function is also obsolete. All backends that have a mandatory version/handshake sequence during the communication perform this already internally. Only when you need the data returned by these commands for some reason, you can call the function explicitly. But although this functions looks device independent, it is not because you need to know the buffer length in advance. And then we could equally well turn it into a backend specific function. In cases such as the sensusultra handshake, where you wouldn't be able to call the function at any time, there is already a backend specific function to retrieve the cached handshake packet. We could even go one step further and introduce a special vendor event to pass the raw version/handshake packet to the application.

Would this cause trouble for anyone?

9. Improved tools

I plan to create a new "dctool" commandline application that is more flexible than the current universal application. the idea is to have a tool supports a number of actions:

dctool [<options>] action [<args>]

Planned actions are dump, dives, parse, extract, read and write. The major difference with the universal application is that the new tool will only perform a single action, but the output can be piped to the next action. This will require defining a common data format to pass the data through the pipe.

Note that this isn't a high priority item because it doesn't have any impact on the public api, and hence can be completed later on.

10. Stable numbers for the backend types (and merging device and parser)

To maintain backwards compatibility, the existing enum values shouldn't change when new backends are added. To achieve this, and at the same time keep backends from the same manufacturer grouped together, I propose to introduce two numbers to identify each backend: one for the vendor, and one for the device itself. This is similar to the USB VID/PID numbers. Both the VID and PID numbers can then be encoded into a single 16 or 32-bit number to make up the enum value.

Additionally, I would like to merge both the device_type_t and parser_type_t enums into a single enum. Right now every device backend has a corresponding parser backend (with a few exception where a parser can be shared by several device backends), and there is no good reason to keep just a single enum. I'm still looking for a good name for the new enum.

11. Return dc_buffer_t pointer instead of integer

I plan to make all dc_buffer_*() functions return the dc_buffer_t pointer to indicate success or failure, instead of an integer. These functions can only fail due to out-of-memory errors, so returning NULL makes more sense, and additionally it allows to chain calls.

12. Units: hydrostatic vs density

A few backends (e.g reefnet, cobalt, ostc) store the depth values (d) as pressure values (P). This makes a lot of sense considered the fact that a depth sensor is actually a pressure sensor. But it also means a conversion is necessary, and that requires knowledge of the atmospheric pressure (Patm) and the water density (rho):

P = rho * g * d + Patm

or if you prefer:

d = (P - Patm) / rho * g

Since those values are not always provided by the device (atmospheric pressure can be measured, but water density can't and is thus typically a user configurable setting or simply unknown), the calibration constants have to be provided by the user.

Note that this conversion has no influence on the decompression calculations, because those are done with the pressure values. The conversion to depths is strictly for displaying to humans, who prefer to see depths :-)

The use of the atmospheric pressure is straightforward, but for the water density their are many possibilities (density, salinity ratio, etc). In the current implementation a "hydrostatic" value is used, which is nothing more than the multiplication of the density with the standard gravity (and thus is equivalent to the pressure of 1m of seawater). The mean reason why this was chosen is simplicity in the implementation, and the nice correspondence with the simplified formula used by divers in the field:

PRESSURE(bar) = DEPTH(m) / 10 + 1
PRESSURE(atm) = DEPTH(ft) / 33 + 1

by defining the parameters (see units.h for the constants) as:

atmospheric = BAR
hydrostatic = MSW

However I'm not convinced the use of this "hydrostatic" parameter is very intuitive, compared to using the more well-known density value. Using the density directly would basically hide the gravity constant from the public api. Do you have any preference towards using hydrostatic vs density?

Note that you can still obtain the same results as with the simplified formula, by defining the parameters as:

atmospheric = BAR
density     = MSW / GRAVITY ~= 1019.7 kg/m^3

The math for imperial units is similar, resulting in these parameters:

atmospheric = ATM
hydrostatic = FSW / FEET
density     = (FSW / FEET) / GRAVITY ~= 1027.2 kg/m^3

13. Documentation

I should take some time to write proper documentation (e.g. doxygen or something similar). But since my time is pretty limited, this has the lowest priority.

14. Enumeration of supported models

Currently there is no way to retrieve the list of supported models, and as a result every single developer has to hard-code such a list into their application. It would be much better if that list could be provided by the libdivecomputer library directly. An additional benefit is that you'll always have access to an up-to-date list.

For the actual api, I'm thinking in the direction of a function that returns a table, containing a mapping between models and backends:

typedef dc_model_t {
       const char *manufacturer;
       const char *name;
       device_type_t type;
       unsigned int model;
} dc_model_t;

const dc_model_t *
dc_get_supported_models (void)
{
       static const dc_model_t g_models[] = {
          {"Suunto", "Vyper", DEVICE_TYPE_SUUNTO_VYPER, 0},
          {"Suunto", "Cobra", DEVICE_TYPE_SUUNTO_VYPER, 0},
          ...
          {NULL, NULL, DEVICE_TYPE_NULL}
       };

       return models;
}

Or some variant with output parameters, rather than a sentinel element to locate the end of the array, like this:

void
dc_get_supported_models (const dc_model_t **models, unsigned int *n)
{
       ...

       *models = g_models;
       *n = C_ARRAY_SIZE (g_models) - 1;
}

But better ideas for the enumeration api are always welcome! In particular it would be nice if we could turn the dc_model_t structure into an opaque data type.

One possibility would be an iterator style api, similar to this:

dc_iterator_t *iterator = dc_get_supported_models ();

while (dc_iterator_next (iterator)) {
     dc_model_t *model = dc_iterator_current (iterator);

     /* Do stuff here */
     dc_model_get_name (model);
     dc_model_get_manufacturer (model);
     ...
}

dc_iterator_free (iterator);

15. Generic dc_device_open function.

Once we have support for enumerating the list of supported devices, we can implement a new dc_device_open() convenience function, which would take care of the mapping from a particular model (as obtained from the enumeration) to the correct xxx_device_open() function. The main advantage would be that an application doesn't need any changes when a new backend is added. An application would simply query the list of supported devices, pass the model selected by the user to the new function, and the libdivecomputer library will automatically select the matching backend.

dc_status_t
dc_device_open (dc_device_t **device, const dc_model_t *model, const char *devname);

or passing the backend type and model number explicitly:

dc_status_t
dc_device_open (dc_device_t **device, device_type_t type, unsigned int
model, const char *devname);

The only issue remaining is the devname parameter. Currently, it's only used for those backends using serial communication. But it's not unlikely that backends using some other low-level communication channel (usb, infrared, file, etc) may need different arguments. At the moment that's not the case, because usb and infrared do perform autodetection (vid/pid for usb and device name for irda), but that may change in the future.

One possibility to address that would be to include some flag in the model descriptor, that indicates the type of low-level communication channel, with an associated data structure for the arguments, which is then passed to the dc_device_open() function:

typedef dc_params_serial_t {
       const char *devname;
} dc_params_serial_t;

typedef dc_params_usb_t {
       ...
} dc_params_usb_t;

dc_status_t
dc_device_open (..., const void *params);

This flag could then also be used by applications to show the correct UI element for those paramters.

16. Generic dc_parser_create function.

For the parser we can do something similar. Currently an application needs to connect to the devinfo and clock events, store the associated data, and pass it back again to the xxx_parser_create() function.

However if the device object would store the event data internally, we can implement a new dc_parser_create() convenience function that grabs the event data directly from the device handle, and automatically does the right thing.

dc_status_t
dc_parser_create (dc_parser_t **parser, dc_device_t *device);

You'll only have to be careful not to call this new function before the events have been emitted. In practice that means at earliest when you receive the first dive. But that's no difference compared to the xxx_parser_create() functions of course.

It may be possible to workaround this limitation, but it's not trivial due to thread-safety issues, and I'm not sure it's worth the effort.

17. A context object for global state

As I have written before, the logging support is problematic because it relies on global state, which is obviously not thread-safe. A possible solution would be to introduce a new context object to store the global state. This approach has several benefits, but it will also require changes on the application side. Basically you'll have to create a context object first, and then pass it down to all calls that create other objects (e.g. device and parser):

dc_context_t *context = NULL;
dc_device_t *device = NULL;

dc_context_create (&context);

dc_device_open (&device, context, ...);
dc_device_foreach (device, ...);
dc_device_close (device);

dc_context_free (context);

Full thread-safe logging could be obtained by requiring one context object per thread (e.g. similar to the device and parser objects), or making the context object itself thread-safe (but without having to deal with a global state now). The context object would also allow to implement more complex initializations, like reading loglevels from environment variables at runtime (just to give an example).

18. Timezone support

The dc_datetime_t struct doesn't have any timezone support. Good timezone support is certainly non-trivial, but since modern dive computers often support a timezone setting, I believe we won't be able to ignore it. Full timezone support is definitely out of scope, but some basic support using a utc offset field would be sufficient for our use case (e.g. similar to the BSD struct tm tm_gmtoff extension).

Therefore I propose to add such a field as a placeholder. Even if it won't be using it immediately, we won't have to break backwards compatibility when we actually start using it later on.

19. Support partial read/write.

In many cases, the dc_device_read() and dc_device_write() functions will split large read/write request into multiple packets. Currently these functions either succeed or fail, but there is no way to indicate a partial success. Especially for writes, it may be good to know how much data was written correctly before a failure occurred.

Therefore, I propose to add an (optional) output parameter that receives the amount of bytes successfully read or written. The return code would function exactly the same as before, and indicate the status of the entire operation. But in the case of a partial success you receive an error, together with the actual number of bytes.

20. Units

Subsurface uses an interesting approach for defining values with units. Basically they define a special struct that makes the units very explicit. Maybe we can consider this approach for libdivecomputer too?

https://github.com/torvalds/subsurface/blob/master/dive.h

Note that internally libdivecomputer will always use metric units, and that won't change. The only change I may consider is using degrees Kelvin for the temperature rather than degrees Celcius. It may be slightly less convenient (for both metric and imperial developers), but it has the advantage that for all units (depth, pressure, temperature) the values are never negative.

21. Improved naming consistency: open/create/new?

Right now there are xxx_open() functions for the device handle, xxx_create() functions for the parser handle and a dc_buffer_new() function, each with their matching close(), free() and dc_buffer_free() functions.

The main difference between open() and create(), is that the create() function only allocates some memory resource, while the open() function also opens a connection with the device. But other than that, it's only a different name.

For create() and new(), the difference is that create() returns an error code, while new() returns a pointer. Thus the only failure new() can return is an out-of-memory condition (e.g. a NULL pointer), which is of course sufficient for a simple memory buffer object. In theory the create() function may return other error codes too, although at the moment that is not the case. Validation of the parameters (like the model code) is postponed until parsing, and the validation of the dive data is done in the xxx_set_data() function.

Personally, I prefer the name "new" over "create". But since it's common practice a new() function returns a pointer, not a status code, that may be confusing? Should we also change to dc_buffer_new() function to return a status code, or just make an exception here? (Note that in the original roadmap there were already plans to make all dc_buffer_xxx() functions return a self pointer, rather than an integer to indicate success or failure. So adding back a status code for the new() function would be a little weird.)

22. Add a destructor for the dive data?

The parser_set_data() function currently doesn't copy the dive data internally, which means an application needs to make sure the date remains alive for the lifetime of the parser. The main reaons why this was done is to avoid extra memory copies. Typically you receive the data from the dive callback (and optionally copy it if you want to use it outside the callback function), parse it, and then destroy it again (if necessary). In this scenario an extra copy would be wasteful.

Maybe it would be a nice idea to pass a destructor function, that the parser can call when it's done with the data? Then the application doesn't have to care anymore about keeping it alive long enough. Or just don't care about the extra memory copy at all, and always make a copy?

In case we don't care about the additional memory copy, then we could also consider dropping the parser_set_data() function. The purpose of this function is to be able to re-use a single parser object for multiple dives. But if we already copy the dive data, which is usually much larger than the parser object itself, then there is little reason to not allocate a new parser for every single dive. In this scenario, the dive data would be passed directly to the xxx_parser_create() function. (Note that this would also get rid of the limitation in #16, because now you won't be able to construct a parser in advance.)

23. Replace the parser interface with a dive interface.

The idea is that the dive callback returns a dc_dive_t object directly. This new object is then a container for the raw dive, with the parser functionality directly integrated. With this change, applications won't have to care anymore about creating the parser, while access to the raw data remains possible.

That's it for now. Feedback is welcome as usual.