For now, don't take this too seriously, it's just an idea initially inspired by Amiga TagItem lists that might go well or not even be worth it...
So to make the long story short, many years ago, about 2005 I decided I wanted to learn to code for the Amiga and got a lot of help to my questions from Classic, AmigaOS4 and MorphOS coders on Amiga.org. Yes, A.Org was different by that time, Piru was there, Thomas, Karlos and others that some of you probably remember... I ended up not finishing the program I was coding and abandoned it completely about 2 years after or so.. There was also some associated code that I was doing that is general and could be applied/used in practically any program. One is a serialization library for C, that's actually not yet in library form. You can of course do your own but I found that when programs get's big there is a lot of stuff that can be generalized without having to write and rewrite functions for every piece of data to be serialized. Also most libraries I checked that implement serialization in C produce human readable code (read bloated...) or just big binaries, or the API was just plain horrible so I didn't check any further...
The first attempt I made was also pretty horrible too, I wanted to bolt serialization to C as if it was there and I ended up writing a whole bunch of very complicated macros to try to make data descriptors and declare structs at the same time. I ended up not being able to get an API that was simple and not mess with C's normal syntax at the same time. The second attempt was cool but I stopped it and gave up due to professional reasons and IT not being my area, it was just an experience for fun. I restarted slowly coding it a bunch of months ago and have advanced it a lot but would like to know if there is interest in it or I should just keep it for myself, and if yes what is your opinion about it. A short description of the ideas follows..
-- The current concept --
The first goal is to have very efficient code with non bloated data format and a simple yet powerfull and flexible API, kind of like C, that is cool but let's you hang yourself if you're not carefull. I think this makes more sense than the approaches that try to be everything and end up with an horrible and difficult to understand API and bloated data ….
Also after my first attempt I assumed that it's not worth it to try to include serialization in C with the macro approach that tries to add to the C syntax. A much better approach would be to do it from an IDE with meta programming that was able to analyze struct definitions and make descriptors for the data without the user having to intervene. This would be as if C already had serialization built in. I know there are various advanced IDE and preprocessors that allow to do this and when the library code is done the next and final step could be to program one if those to be able to declare structs and with a simple instruction/command get the whole of a struct's elements included in an automatically made descriptor. Something like that anyway...
I don't think this means there shouldn't be a simple/powerfull etc.. API, just that it should take into account C's limitations and concentrate on efficience and code size, while almost automatic serialization features should be added by preprocessor or a tool that parses the code and is integrated with an IDE. I'll repeat a bit of this more in context bellow.
Without further talk (by now you're probably thinking I'm nuts anyway...) here's a short description of the current API.
-- API --
To serialize a set of data you call the SS_SaveData (). To scatter load a serialized stream into memory SS_LoadData () is used.
Code:
The prototypes for both functions are:
BOOL SS_LoadData (ULONG *DataInfo, struct *Window, ULONG ID, ...);
BOOL SS_SaveData (ULONG *DataInfo, struct *Window, ULONG ID, ...);
- DataInfo is a descriptor (sometimes also called DataInfo or data info) for the group of data to be serialized, that the user makes using macros (described bellow) that try to minimize the required input to make the descriptors to the maximum, but having low descriptor size and efficiency as priority.
- Window is a pointer to a window to which error messages should be output, can be NULL.
- ID is an identifier. In this case there can be various IDs making a path to the element(s) to be serialized. In a descriptor data can be organized in groups, each having their own sub ID, so it's possible to specify which sub group or even element within a descriptor is to be serialized. ID path is terminated by a NULL.
- Making a descriptor -
Currently a descriptor is made by declaring a ULONG array which is initialized by macros. At the top level there must allways be a group, at least for the DataInfo passed to the serializing functions. There can be a hierarchy of groups inside each descriptor.
To declare a group you use one of the SS_GROUP() or SS_GROUPf() macros. Both are function like macros wich take an ID and a FileName as arguments. SS_GROUPf() also takes DefaultFileName as an extra argument.
- ID is an identifier for the group.
- FileName is the file name of the group data, when serialized to a file
- DefaultFileName is used only by SS_GROUPf() and is the file name of a file containing the default data to be used in case an ID is not present when loading a stream with SS_LoadData(). When there are subgroups the top level group's defaults file has priority over the others. The hierarchy of defaults established by declaring groups with SS_GROUPf() only makes sense when reusage is involved (description of reusage bellow).
Inside each group there are elements, again declared by macros for minium input effort, each adding data to make out the group. There can also be subgroups. All groups must be terminated by a 0 (zero).
- Basic C types
The most basic group element corresponds to a simple C type variable and is included with one of the SS_EL() or SS_ELd() macros. Both take an ID and the variable name to be included in the data stream as arguments. SS_ELd() takes an extra argument that is the default value to be used in case an ID for this element is not present when loading a stream with SS_LoadData(). This default value has last priority to be used as a default, i.e. if there is a parent group of type SS_GROUPf and it's defaults file contains the group ID this element belongs to and this ID, it will have priority.
Example:
Imagine there is a boolean variable somewhere defining if there should be warnings in a GUI.
Code:
BOOL BasicWarnings;
#define ID_BasicWarnings 101
/* Suggestion: As above, IDs should be defined next to the variable they apply to for more readable code */
#define ID_ROOTGROUP 100
ULONG SettingsDescriptor[] = /* Contains references to all the data we want to include in this group, in this case only a BOOL variable */
{
SS_GROUPf (ID_ROOTGROUP, “RootGroup”, “RootGroupDefaultsFile”)
SS_ELd (ID_BasicWarnings, BasicWarnings, TRUE),
0
};
/* Descriptor is done, now load every variable in root group */
if (!SS_LoadData (SettingsDescriptor, NULL, ID_ROOTGROUP, NULL))
printf (“Error loading datan”);
…
…
/* Save every variable in root group */
if (!SS_SaveData (SettingsDescriptor, NULL, ID_ROOTGROUP, NULL))
printf (“Error saving datan”);
…
...
- Structures
For structs the principle is the same, it's elements being added as the basic C types. At first this might not seem possible, because to get at struct elements one has to use offsets instead of the address. However very elaborate macros take care of that and the input “syntax” is the same! This was an achievement that made me happy and consider this a bit more seriously. SS_ST() is used first to define the base address. After that SS_STNM must be defined with the struct's type name, after which the elements are added.
Example:
Suppose in the previous example there is also a structure called TestStruct.
The new data info including it to the serialized stream after BasicWarnings would be:
Code:
struct TestStruct
{
int a;
char b;
int c;
} TstSt;
/* Element IDs */
enum TestStructIDs
{
ID_TSTST = 120,
ID_TS_a,
ID_TS_b,
ID_TS_c
};
ULONG SettingsDescriptor[] =
{
SS_GROUPf (ID_ROOTGROUP, “RootGroup”, “RootGroupDefaultsFile”)
SS_ELd (ID_BasicWarnings, BasicWarnings, TRUE),
SS_ST (ID_TSTST, TstSt)
#define SS_STNM TestStruct
EL (ID_TS_a, a),
ELd (ID_TS_b, b, 10), /* As an example, this element has a default value if no value is found on the input stream when loading */
EL (ID_TS_c, c),
0,
#undef SS_STNM
0
};
Some observations: Some input effort is further reduced by using reusage see bellow. Also, as mentioned in the beggining, some of this stuff could be done automatically, like the making of IDs and making descriptors for whole structures when they are declared. Many times if not most though, only some of a struct's elements are to be included and they would have to be listed manualy anyway, even if the effort to do so could be less. And IDs are many times referenced in other code parts, like GUI object IDs, so in many cases it's usefull and even better to defined them manually.
Even if such automatisms already existed for this library it should be however more or less clear that the final descriptor data, even if automatically generated, would have to exist and so the efficiency/size of the final code would not be better. As said I found some libraries that try to acomplish that goal of less input effort by using so complicated macros that the syntax of C ends obfuscated and/or try to improve the API at the expense of bigger descriptor data. That's a compromise I did not want to make.
Again, ironically almost perfect automatism can still be achieved here without compromises if part of the above code is automatically produced by a parser tool, maybe integrated with an IDE or an advanced preprocessor and this could be the next step...
- TagItem lists
For TagItem lists the macros SS_TAGITEM() and SS_TAGITEMd() are used. Both work like SS_EL() and SS_ELd() and take an ID and the variable name of the array holding the TagItem list as arguments. SS_TAGITEMd() takes one extra argument that's the name of the default TagItem list to use for default values.
- Reusage
Sometimes a data info or part of it would have to be repeated in different groups. To avoid this a descriptor can be reused inside another without having to duplicate it. Even better, when using reusage, the defaults can be overriden with new ones.
It's possible to make data infos that have actual variable addresses to be reused on various groups or even general data infos that only define the structure elements to be included, with the actual addresses of the struct variables being different from group to group.
This is also possible for partial struct elements, i.e. declaring some of a struct's elements in a general reusage descriptor and then having it being just pointed to without having to re input those elements again in other struct data info definitions.
If there is interest in this (I honestly think there probably isn't, but as I said, just checking it with this post) I can add examples, the API to do it is pretty simple...
- Other features already implemented but not described here
Nested structs
Pointer support including pointer following
Non invasive descriptors (should be self evident...:))
- Other features not yet implemented
Arrays (already implemented TagItem lists and pointer following though, so the code is more or less structured for it), including arrays of complex nested structs with buried pointers to other arrays combined with pointer following.
Endian agnostic
Multiplatform
Versioning (probably based on IDs and callback functions)
Custom callback function that can be passed to the API, which would call it for various types of data or a certain type, which could set default data, set pointers, whatever. Some default callbacks could be included.
Open source ?
[ Edited by Jose 21.07.2015 - 21:05 ]