Gauging interest for serialization library...
  • Just looking around
    Jose
    Posts: 15 from 2014/2/12
    For now, don't take this too seriously, it's just an idea initially inspired by Amiga TagItem lists that might go well or not even be worth it...
    So to make the long story short, many years ago, about 2005 I decided I wanted to learn to code for the Amiga and got a lot of help to my questions from Classic, AmigaOS4 and MorphOS coders on Amiga.org. Yes, A.Org was different by that time, Piru was there, Thomas, Karlos and others that some of you probably remember... I ended up not finishing the program I was coding and abandoned it completely about 2 years after or so.. There was also some associated code that I was doing that is general and could be applied/used in practically any program. One is a serialization library for C, that's actually not yet in library form. You can of course do your own but I found that when programs get's big there is a lot of stuff that can be generalized without having to write and rewrite functions for every piece of data to be serialized. Also most libraries I checked that implement serialization in C produce human readable code (read bloated...) or just big binaries, or the API was just plain horrible so I didn't check any further...
    The first attempt I made was also pretty horrible too, I wanted to bolt serialization to C as if it was there and I ended up writing a whole bunch of very complicated macros to try to make data descriptors and declare structs at the same time. I ended up not being able to get an API that was simple and not mess with C's normal syntax at the same time. The second attempt was cool but I stopped it and gave up due to professional reasons and IT not being my area, it was just an experience for fun. I restarted slowly coding it a bunch of months ago and have advanced it a lot but would like to know if there is interest in it or I should just keep it for myself, and if yes what is your opinion about it. A short description of the ideas follows..

    -- The current concept --
    The first goal is to have very efficient code with non bloated data format and a simple yet powerfull and flexible API, kind of like C, that is cool but let's you hang yourself if you're not carefull. I think this makes more sense than the approaches that try to be everything and end up with an horrible and difficult to understand API and bloated data ….
    Also after my first attempt I assumed that it's not worth it to try to include serialization in C with the macro approach that tries to add to the C syntax. A much better approach would be to do it from an IDE with meta programming that was able to analyze struct definitions and make descriptors for the data without the user having to intervene. This would be as if C already had serialization built in. I know there are various advanced IDE and preprocessors that allow to do this and when the library code is done the next and final step could be to program one if those to be able to declare structs and with a simple instruction/command get the whole of a struct's elements included in an automatically made descriptor. Something like that anyway...
    I don't think this means there shouldn't be a simple/powerfull etc.. API, just that it should take into account C's limitations and concentrate on efficience and code size, while almost automatic serialization features should be added by preprocessor or a tool that parses the code and is integrated with an IDE. I'll repeat a bit of this more in context bellow.

    Without further talk (by now you're probably thinking I'm nuts anyway...) here's a short description of the current API.

    -- API --
    To serialize a set of data you call the SS_SaveData (). To scatter load a serialized stream into memory SS_LoadData () is used.

    Code:

    The prototypes for both functions are:
    BOOL SS_LoadData (ULONG *DataInfo, struct *Window, ULONG ID, ...);
    BOOL SS_SaveData (ULONG *DataInfo, struct *Window, ULONG ID, ...);

    - DataInfo is a descriptor (sometimes also called DataInfo or data info) for the group of data to be serialized, that the user makes using macros (described bellow) that try to minimize the required input to make the descriptors to the maximum, but having low descriptor size and efficiency as priority.
    - Window is a pointer to a window to which error messages should be output, can be NULL.
    - ID is an identifier. In this case there can be various IDs making a path to the element(s) to be serialized. In a descriptor data can be organized in groups, each having their own sub ID, so it's possible to specify which sub group or even element within a descriptor is to be serialized. ID path is terminated by a NULL.

    - Making a descriptor -
    Currently a descriptor is made by declaring a ULONG array which is initialized by macros. At the top level there must allways be a group, at least for the DataInfo passed to the serializing functions. There can be a hierarchy of groups inside each descriptor.
    To declare a group you use one of the SS_GROUP() or SS_GROUPf() macros. Both are function like macros wich take an ID and a FileName as arguments. SS_GROUPf() also takes DefaultFileName as an extra argument.
    - ID is an identifier for the group.
    - FileName is the file name of the group data, when serialized to a file
    - DefaultFileName is used only by SS_GROUPf() and is the file name of a file containing the default data to be used in case an ID is not present when loading a stream with SS_LoadData(). When there are subgroups the top level group's defaults file has priority over the others. The hierarchy of defaults established by declaring groups with SS_GROUPf() only makes sense when reusage is involved (description of reusage bellow).

    Inside each group there are elements, again declared by macros for minium input effort, each adding data to make out the group. There can also be subgroups. All groups must be terminated by a 0 (zero).

    - Basic C types
    The most basic group element corresponds to a simple C type variable and is included with one of the SS_EL() or SS_ELd() macros. Both take an ID and the variable name to be included in the data stream as arguments. SS_ELd() takes an extra argument that is the default value to be used in case an ID for this element is not present when loading a stream with SS_LoadData(). This default value has last priority to be used as a default, i.e. if there is a parent group of type SS_GROUPf and it's defaults file contains the group ID this element belongs to and this ID, it will have priority.

    Example:

    Imagine there is a boolean variable somewhere defining if there should be warnings in a GUI.
    Code:

    BOOL BasicWarnings;
    #define ID_BasicWarnings 101

    /* Suggestion: As above, IDs should be defined next to the variable they apply to for more readable code */
    #define ID_ROOTGROUP 100
    ULONG SettingsDescriptor[] = /* Contains references to all the data we want to include in this group, in this case only a BOOL variable */
    {
    SS_GROUPf (ID_ROOTGROUP, “RootGroup”, “RootGroupDefaultsFile”)
    SS_ELd (ID_BasicWarnings, BasicWarnings, TRUE),
    0
    };

    /* Descriptor is done, now load every variable in root group */
    if (!SS_LoadData (SettingsDescriptor, NULL, ID_ROOTGROUP, NULL))
    printf (“Error loading datan”);


    /* Save every variable in root group */
    if (!SS_SaveData (SettingsDescriptor, NULL, ID_ROOTGROUP, NULL))
    printf (“Error saving datan”);

    ...


    - Structures
    For structs the principle is the same, it's elements being added as the basic C types. At first this might not seem possible, because to get at struct elements one has to use offsets instead of the address. However very elaborate macros take care of that and the input “syntax” is the same! This was an achievement that made me happy and consider this a bit more seriously. SS_ST() is used first to define the base address. After that SS_STNM must be defined with the struct's type name, after which the elements are added.

    Example:

    Suppose in the previous example there is also a structure called TestStruct.
    The new data info including it to the serialized stream after BasicWarnings would be:
    Code:

    struct TestStruct
    {
    int a;
    char b;
    int c;
    } TstSt;
    /* Element IDs */
    enum TestStructIDs
    {
    ID_TSTST = 120,
    ID_TS_a,
    ID_TS_b,
    ID_TS_c
    };

    ULONG SettingsDescriptor[] =
    {
    SS_GROUPf (ID_ROOTGROUP, “RootGroup”, “RootGroupDefaultsFile”)
    SS_ELd (ID_BasicWarnings, BasicWarnings, TRUE),
    SS_ST (ID_TSTST, TstSt)
    #define SS_STNM TestStruct
    EL (ID_TS_a, a),
    ELd (ID_TS_b, b, 10), /* As an example, this element has a default value if no value is found on the input stream when loading */
    EL (ID_TS_c, c),
    0,
    #undef SS_STNM
    0
    };

    Some observations: Some input effort is further reduced by using reusage see bellow. Also, as mentioned in the beggining, some of this stuff could be done automatically, like the making of IDs and making descriptors for whole structures when they are declared. Many times if not most though, only some of a struct's elements are to be included and they would have to be listed manualy anyway, even if the effort to do so could be less. And IDs are many times referenced in other code parts, like GUI object IDs, so in many cases it's usefull and even better to defined them manually.
    Even if such automatisms already existed for this library it should be however more or less clear that the final descriptor data, even if automatically generated, would have to exist and so the efficiency/size of the final code would not be better. As said I found some libraries that try to acomplish that goal of less input effort by using so complicated macros that the syntax of C ends obfuscated and/or try to improve the API at the expense of bigger descriptor data. That's a compromise I did not want to make.
    Again, ironically almost perfect automatism can still be achieved here without compromises if part of the above code is automatically produced by a parser tool, maybe integrated with an IDE or an advanced preprocessor and this could be the next step...

    - TagItem lists
    For TagItem lists the macros SS_TAGITEM() and SS_TAGITEMd() are used. Both work like SS_EL() and SS_ELd() and take an ID and the variable name of the array holding the TagItem list as arguments. SS_TAGITEMd() takes one extra argument that's the name of the default TagItem list to use for default values.


    - Reusage
    Sometimes a data info or part of it would have to be repeated in different groups. To avoid this a descriptor can be reused inside another without having to duplicate it. Even better, when using reusage, the defaults can be overriden with new ones.
    It's possible to make data infos that have actual variable addresses to be reused on various groups or even general data infos that only define the structure elements to be included, with the actual addresses of the struct variables being different from group to group.
    This is also possible for partial struct elements, i.e. declaring some of a struct's elements in a general reusage descriptor and then having it being just pointed to without having to re input those elements again in other struct data info definitions.

    If there is interest in this (I honestly think there probably isn't, but as I said, just checking it with this post) I can add examples, the API to do it is pretty simple...

    - Other features already implemented but not described here
    Nested structs
    Pointer support including pointer following
    Non invasive descriptors (should be self evident...:))
    - Other features not yet implemented
    Arrays (already implemented TagItem lists and pointer following though, so the code is more or less structured for it), including arrays of complex nested structs with buried pointers to other arrays combined with pointer following.
    Endian agnostic
    Multiplatform
    Versioning (probably based on IDs and callback functions)
    Custom callback function that can be passed to the API, which would call it for various types of data or a certain type, which could set default data, set pointers, whatever. Some default callbacks could be included.

    Open source ?


    [ Edited by Jose 21.07.2015 - 21:05 ]
  • »20.07.15 - 01:49
    Profile
  • Just looking around
    Jose
    Posts: 15 from 2014/2/12
    There was supposed to be some indentation on the descriptors, making it very readable but the code forum tag screwed it up, is there any way to preserve it ? If I choose to edit the previous post I can still seen the indentation on the edit window...

    [ Edited by Jose 20.07.2015 - 01:04 ]
  • »20.07.15 - 02:02
    Profile
  • ASiegel
    Posts: 1372 from 2003/2/15
    From: Central Europe
    @Jose, very sorry about what happened with your post. If there is any chance you could repost the code examples, I will be happy to investigate the indentation issue for you.

    Again, very sorry. I realize this was a long post and probably took quite a while to write.
  • »21.07.15 - 16:16
    Profile
  • Just looking around
    Jose
    Posts: 15 from 2014/2/12
    Hi, thanks for trying to fix the indentation, I've reedited the post with the original text (luckily I thought it would be wise to make a backup..:)).

    The original text was saved in RTF. I copy/pasted in to the forum text editor and added the code tags manually. If I click on edit I can still see the original indentation on the example code..
  • »21.07.15 - 21:59
    Profile
  • MorphOS Developer
    itix
    Posts: 1520 from 2003/2/24
    From: Finland
    Quote:

    Jose wrote:

    -- API --
    To serialize a set of data you call the SS_SaveData (). To scatter load a serialized stream into memory SS_LoadData () is used.

    Code:

    The prototypes for both functions are:
    BOOL SS_LoadData (ULONG *DataInfo, struct *Window, ULONG ID, ...);
    BOOL SS_SaveData (ULONG *DataInfo, struct *Window, ULONG ID, ...);




    You maybe would like to take a look at Ambient prefspool implementation. It is not serialization API but it has many similarities to your idea.

    Quote:


    - DataInfo is a descriptor (sometimes also called DataInfo or data info) for the group of data to be serialized, that the user makes using macros (described bellow) that try to minimize the required input to make the descriptors to the maximum, but having low descriptor size and efficiency as priority.



    Instead of just having a descriptor there could be maybe an API call to create one in fly with the data? Something like SS_SaveDataItem(). Building descriptors sound clunky.

    Quote:


    - Window is a pointer to a window to which error messages should be output, can be NULL.



    Passing a low level intuition window pointer is bad idea IMO. For error messages it could be better define callback API that is called on each error?

    For more practical handling you maybe want to create a black box context for all this stuff.

    Quote:


    - Making a descriptor -
    Currently a descriptor is made by declaring a ULONG array which is initialized by macros. At the top level there must allways be a group, at least for the DataInfo passed to the serializing functions. There can be a hierarchy of groups inside each descriptor.
    To declare a group you use one of the SS_GROUP() or SS_GROUPf() macros. Both are function like macros wich take an ID and a FileName as arguments. SS_GROUPf() also takes DefaultFileName as an extra argument.
    - ID is an identifier for the group.
    - FileName is the file name of the group data, when serialized to a file
    - DefaultFileName is used only by SS_GROUPf() and is the file name of a file containing the default data to be used in case an ID is not present when loading a stream with SS_LoadData(). When there are subgroups the top level group's defaults file has priority over the others. The hierarchy of defaults established by declaring groups with SS_GROUPf() only makes sense when reusage is involved (description of reusage bellow).



    Having full example code would help. It could be also good idea create a proto to test design if you dont have one already?

    Quote:


    Example:

    Imagine there is a boolean variable somewhere defining if there should be warnings in a GUI.
    Code:

    BOOL BasicWarnings;
    #define ID_BasicWarnings 101

    /* Suggestion: As above, IDs should be defined next to the variable they apply to for more readable code */
    #define ID_ROOTGROUP 100
    ULONG SettingsDescriptor[] = /* Contains references to all the data we want to include in this group, in this case only a BOOL variable */
    {
    SS_GROUPf (ID_ROOTGROUP, “RootGroup”, “RootGroupDefaultsFile”)
    SS_ELd (ID_BasicWarnings, BasicWarnings, TRUE),
    0
    };

    /* Descriptor is done, now load every variable in root group */
    if (!SS_LoadData (SettingsDescriptor, NULL, ID_ROOTGROUP, NULL))
    printf (“Error loading datan”);


    /* Save every variable in root group */
    if (!SS_SaveData (SettingsDescriptor, NULL, ID_ROOTGROUP, NULL))
    printf (“Error saving datan”);

    ...




    Should not you define datatype and datatype size somewhere? For example string pointer, floats, 8/16/32/64 bit integers and so on?

    Quote:


    - Structures
    For structs the principle is the same, it's elements being added as the basic C types. At first this might not seem possible, because to get at struct elements one has to use offsets instead of the address. However very elaborate macros take care of that and the input “syntax” is the same! This was an achievement that made me happy and consider this a bit more seriously. SS_ST() is used first to define the base address. After that SS_STNM must be defined with the struct's type name, after which the elements are added.

    Example:

    Suppose in the previous example there is also a structure called TestStruct.
    The new data info including it to the serialized stream after BasicWarnings would be:
    Code:

    struct TestStruct
    {
    int a;
    char b;
    int c;
    } TstSt;
    /* Element IDs */
    enum TestStructIDs
    {
    ID_TSTST = 120,
    ID_TS_a,
    ID_TS_b,
    ID_TS_c
    };

    ULONG SettingsDescriptor[] =
    {
    SS_GROUPf (ID_ROOTGROUP, “RootGroup”, “RootGroupDefaultsFile”)
    SS_ELd (ID_BasicWarnings, BasicWarnings, TRUE),
    SS_ST (ID_TSTST, TstSt)
    #define SS_STNM TestStruct
    EL (ID_TS_a, a),
    ELd (ID_TS_b, b, 10), /* As an example, this element has a default value if no value is found on the input stream when loading */
    EL (ID_TS_c, c),
    0,
    #undef SS_STNM
    0
    };




    Macros look very non-descriptive. From a developer POV it could be easier if one could just:

    SS_ReadItem(context, ID, to_pointer, to_offset);
    SS_SaveItem(context, ID, from_pointer, from_offset);

    This is more flexible if member variables in the struct are shuffled.

    Quote:


    - TagItem lists
    For TagItem lists the macros SS_TAGITEM() and SS_TAGITEMd() are used. Both work like SS_EL() and SS_ELd() and take an ID and the variable name of the array holding the TagItem list as arguments. SS_TAGITEMd() takes one extra argument that's the name of the default TagItem list to use for default values.



    Actually, it could be nice if an entire taglist could be saved with just one call:

    SS_SaveTagList(context, CONST struct TagItem *taglist);

    Tag lists are NULL terminated, they have ID, position is already in the taglist. What is missing is datatype (32-bit integer or ptr to other datatype, like a string). Some bits from the tag ID could be allocated to this purpose.

    Quote:


    If there is interest in this (I honestly think there probably isn't, but as I said, just checking it with this post) I can add examples, the API to do it is pretty simple...



    Probably not much users :) although developers have often need to save settings in flexible manner. Everyone is using his own system.

    MUI has this system built-in but is not flexible for user data (btw also consider how to save an array of strings!).
    1 + 1 = 3 with very large values of 1
  • »22.07.15 - 08:04
    Profile
  • Just looking around
    Jose
    Posts: 15 from 2014/2/12
    @Itix
    Thanks for your comments. I'll take a look at Ambient prefspool later on before releasing it.

    Quote:

    Instead of just having a descriptor there could be maybe an API call to create one in fly with the data? Something like SS_SaveDataItem(). Building descriptors sound clunky.


    I think that would cause overhead and for many types of data it would be impossible for the library to guess how to save it anyway, some orientation has to be made i.e. if you don't want to save all elements of a struct you have to define which ones you want..

    Quote:

    Passing a low level intuition window pointer is bad idea IMO. For error messages it could be better define callback API that is called on each error?

    For more practical handling you maybe want to create a black box context for all this stuff.


    I agree, I'll extend it with a callback in the future, so the user can choose how errors are handled or if he just wants them handled automatically (maybe with some options...).

    Quote:

    Having full example code would help. It could be also good idea create a proto to test design if you dont have one already?



    I do have test code, that's how I test it, but it's a bit of a mess right now and not suitable to include in a presentation.. I'll post some later on. It's really simple, you just add something like Code:
    SS_EXT(DtInf)
    , where DtInf is the name of the already defined descriptor, allowing cross references to other descriptors as if they belonged to the current one.

    Quote:

    Should not you define datatype and datatype size somewhere? For example string pointer, floats, 8/16/32/64 bit integers and so on?



    Aha! Now we're getting to it :) No, the macros are already doing all of that, they add data pointer / size / type automatically so the user only has to make a list of elements to be serialized using said macros.

    Quote:

    Macros look very non-descriptive. From a developer POV it could be easier if one could just:

    SS_ReadItem(context, ID, to_pointer, to_offset);
    SS_SaveItem(context, ID, from_pointer, from_offset);

    This is more flexible if member variables in the struct are shuffled.



    If I get it right you mean group of data could be repeatedly saved to various places with a general descriptor, I have something already done for that too. Descriptors can include fixed addressed or offset based positions so that they can be used general (that's how reusage is implemented, again best way is so include some code, I'll do that later...).
    Macros names are actually somewhere descriptive but because they do a whole bunch of things at the same time there's no short descriptive name..
    EL just means "Element", it's just to include an element in a group to be (un)serialized. ELd is exactly the same but it allows for a default (hence the extra "d") to be included in descriptor if non ID for this data is found on the stream when loading the data back to memory from a serialized stream or even when serializing data to a stream to be sent over the network (different descriptors can be defined for unserializing/serializing with reusage between them so that the whole list of data doesn't have to be added again...).
    Missed the shuffled part ?

    Quote:

    Actually, it could be nice if an entire taglist could be saved with just one call:

    SS_SaveTagList(context, CONST struct TagItem *taglist);

    Tag lists are NULL terminated, they have ID, position is already in the taglist. What is missing is datatype (32-bit integer or ptr to other datatype, like a string). Some bits from the tag ID could be allocated to this purpose.


    Agreed, specially when the whole things was inspired by Tag lists in the first place:) I'll look into that.

    Quote:

    Probably not much users :) although developers have often need to save settings in flexible manner. Everyone is using his own system.

    MUI has this system built-in but is not flexible for user data (btw also consider how to save an array of strings!).


    I've seen lots of questions about how this is done in general C forums and there's no generally used method, so I think a multiplatform library to do it would be cool (will be ...:)). Even beyond the efforts saved to code a specific implementation all the time this could be used to save settings between AROS and 68k/PPC without having to worry about endianess problems. My idea for endianess is to just reimplement most of the functions again for different platforms endian wise, cumbersome but clear/faster.


    I have run into a problem recently and I'm not finding a way to solve it. I want to include a zero at the beginning of each descriptor that's used by the code as a space for some maitenance. I wanted it to be included with a macro instead of the user, he might forget it and it's less work, but there's no way to start an initializer list with a zero, i.e. Code:
    {0, 7, 9,  2, 4...}
    and keep a "C like interface".
    For example, if I want to have a macro that automatically add's a zero to an initializer list I won't be able to have the descriptors declared as a C array anymore:
    Code:
    ULONG SettingsDescriptor[] =
    {
    EL (ID_VarA, VariableA),
    EL (ID_VarB, VariableB),
    0
    };


    Code:

    /* Now declared with a macro to automatically add zero at the start */
    #define SS_DEF(DefName) ULONG DefName[] = /* Potential descriptor macro */
    {
    0,
    #define SS_END ,0}

    SS_DEF(SettingsDescriptor)
    EL (ID_VarA, VariableA),
    EL (ID_VarB, VariableB),
    SS_END;


    This last way however makes the declaration a bit like Reaction macros and I would like to avoid it because it breaks normal C declarations a bit but there seems to be no way around it.
    Before anyone suggests, a dynamic memory allocation would make the code having to initialize itself for each call, the absence of which is now something that I really like...

    What do you think, is this too far from normall C syntax for you ? I've seen worse but I wanted it to be perfect, at least like:
    Code:

    SS_DEF(SettingsDescriptor)=
    {
    EL (ID_VarA, VariableA),
    EL (ID_VarB, VariableB),
    SS_END
    };


    or
    Code:

    SS_DEF(SettingsDescriptor)
    {
    EL (ID_VarA, VariableA),
    EL (ID_VarB, VariableB),
    SS_END
    };


    But I couldn't come up with a macro to do this because once the bracked is opened there's no way out of it without having to add a closing bracked and declare another variable. With that approach (the macro itself closing the bracket) then the first varible would have to be a struct with said long int containing a zero and a pointer to a second variable containing the descriptor array. This is not even possible in C (forward references are not allowed in C).
    Guess I'm nitpicking here, but I wanted it to be perfect :)

    [ Edited by Jose 30.07.2015 - 02:04 ]
  • »30.07.15 - 02:43
    Profile
  • Just looking around
    Jose
    Posts: 15 from 2014/2/12
    There seems to be another bug in the forums code where it's eating the backslash ('\') in my macro examples at the end. Again, it's there if I press edit..
  • »30.07.15 - 03:06
    Profile