charsets library examples
  • Butterfly
    Butterfly
    walkero
    Posts: 99 from 2006/3/1
    Hello everyone. I am looking to do some encoding conversions in my app and I would like to use the charsets.library. Are there any examples anywhere on how to convert ISO encoded texts to UTF8, and reverse?

    I checked the autodoc and found how to get the current system encoding, but I am a little bit confused with the ConvertTagList().

    Any info would also be helpful.
  • »26.12.23 - 17:52
    Profile Visit Website
  • Butterfly
    Butterfly
    walkero
    Posts: 99 from 2006/3/1
    @jacadcaps
    That's perfect. Thank you so much.
  • »27.12.23 - 13:00
    Profile Visit Website
  • Butterfly
    Butterfly
    walkero
    Posts: 99 from 2006/3/1
    Hello all again.
    I am having a hard time to convert UTF8 text to ISO-8859-7. Here is what I am doing so far

    Code:

    const char* to = "ISO-8859-7";
    const char* from = "UTF-8";
    size_t text_len = 0;
    APTR text = <here is the unicode text>

    char *output;
    LONG output_len = 0;

    ULONG fromMib = GetCharsetNumber(from, CSF_IANA_MIMENAME);
    if (fromMib == 0)
    fromMib = GetCharsetNumber(from, CSF_IANA_NAME);
    if (fromMib == 0)
    fromMib = GetCharsetNumber(from, CSF_IANA_ALIAS);

    ULONG toMib = GetCharsetNumber(to, CSF_IANA_MIMENAME);
    if (toMib == 0)
    toMib = GetCharsetNumber(to, CSF_IANA_NAME);
    if (toMib == 0)
    toMib = GetCharsetNumber(to, CSF_IANA_ALIAS);

    printf("DBG: %s, %ld -> %s, %ldn", from, fromMib, to, toMib);
    output_len = GetLength(text, text_len, fromMib);
    printf("DBG: output_len %ldn", output_len);

    LONG dstEnc = 0;
    struct TagItem tags[] = { { CST_DoNotTerminate, FALSE }, { CST_GetDestEncoding, &dstEnc }, { TAG_DONE, 0 } };

    printf("DBG: Text: %dn%sn", text_len, (char *)text);

    LONG result = ConvertTagList((APTR)text, text_len, (APTR)output, output_len, fromMib, toMib, tags);
    printf("DBG: dstEnc: %ldn", dstEnc);
    if (result <= 0)
    {
    printf("DBG: failed converting from '%s' to '%s'n", from, to);
    return 2;
    }
    printf("DBG: ~%s~n", output);


    What I am getting at the debug output is the following:

    Code:

    DBG: UTF-8, 106 -> ISO-8859-7, 10
    DBG: output_len 32
    DBG: Text: 58
    <here is the unicode text printed>

    DBG: dstEnc: 10
    DBG: failed converting from 'UTF-8' to 'ISO-8859-7'
    DBG: ~(null)~


    Although that it gets the text correctly, and this is visible from the correct length count (Text: 58) and when it is printed at the terminal, the conversion of it always fail.

    Also, the CST_GetDestEncoding seems to get the right Mib for ISO-8859-7

    Any ideas will be helpful.
  • »22.02.24 - 21:32
    Profile Visit Website
  • MorphOS Developer
    Piru
    Posts: 574 from 2003/2/24
    From: finland, the l...
    @walkero

    You don't seem to allocate the output buffer. Note that if you intend to have the string 0 terminated you need to allocate one extra char and pass that larger size to ConvertTagList, too.

    The code is clear missing some other parts, too, so it's hard to tell what is going wrong.

    [ Edited by Piru 23.02.2024 - 11:54 ]
  • »23.02.24 - 09:47
    Profile
  • Butterfly
    Butterfly
    walkero
    Posts: 99 from 2006/3/1
    @Piru
    Thank you so much for your prompt reply.

    Quote:

    You don't seem to allocate the output buffer. Note that if you intend to have the string 0 terminated you need to allocate one extra char and pass that larger size to ConvertTagList, too.


    When you say allocate I guess you mean to allocate memory for it, right? So, I guess the ConvertTagList doesn't do that itself.


    Quote:

    The code is clear missing some other parts, too, so it's hard to tell what is going wrong.


    I think that this code has everything that is needed to do the conversion. Is there something else that I had to do?

    I appreciate your help.
  • »23.02.24 - 12:05
    Profile Visit Website
  • MorphOS Developer
    Piru
    Posts: 574 from 2003/2/24
    From: finland, the l...
    Quote:

    When you say allocate I guess you mean to allocate memory for it, right? So, I guess the ConvertTagList doesn't do that itself.

    ConvertTagList does not allocate the buffer for you. Currently the code above uses random pointer for the output buffer. It cannot work. The caller must provide this buffer and the size of it. Typically you allocate some memory and release it afterwards.

    Quote:

    I think that this code has everything that is needed to do the conversion. Is there something else that I had to do?

    text_len is always 0. That cannot work.
  • »23.02.24 - 13:11
    Profile
  • Butterfly
    Butterfly
    walkero
    Posts: 99 from 2006/3/1
    @piru
    Quote:

    ConvertTagList does not allocate the buffer for you. Currently the code above uses random pointer for the output buffer. It cannot work. The caller must provide this buffer and the size of it. Typically you allocate some memory and release it afterwards.


    I will try to set it up and get back to you.

    Quote:

    text_len is always 0. That cannot work.


    Oh, yeah... you are right. That part is missing. But it has the size, and it is visible at the output, where I get:

    Code:
    DBG: Text: 58


    Thank you for your help. I will try it and get back.
  • »23.02.24 - 14:14
    Profile Visit Website
  • Butterfly
    Butterfly
    walkero
    Posts: 99 from 2006/3/1
    @Piru
    That worked. Allocating the output is working now.

    Is it preferrable to use AllocMem, AllocMemAligned or the malloc? I used the latter, but I curious to understand what is the preferrable way to do it.
  • »23.02.24 - 16:07
    Profile Visit Website
  • MorphOS Developer
    Piru
    Posts: 574 from 2003/2/24
    From: finland, the l...
    Quote:

    walkero wrote:
    @Piru
    Is it preferrable to use AllocMem, AllocMemAligned or the malloc? I used the latter, but I curious to understand what is the preferrable way to do it.

    There is no preference, use whatever suits your application best. malloc is nice in that it will be released automatically when the application terminates.
  • »23.02.24 - 18:41
    Profile
  • Butterfly
    Butterfly
    walkero
    Posts: 99 from 2006/3/1
    @Piru
    I decided to use calloc, so to clear the allocated memory as well. Also, I decided to calculate the output length based on GetByteSize() and not GetLength(), since, if I get it right, it gets in consideration the target encoding and that is quite useful when the characters are multibyte, like the UTF-8.

    Now my app is able to encode the text in other encodings and also open any file with different encodings. Quite useful for a code editor.

    I hope to have soon a new release.

    Thank you guys again for your help.
  • »23.02.24 - 19:41
    Profile Visit Website
  • Priest of the Order of the Butterfly
    Priest of the Order of the Butterfly
    beworld
    Posts: 587 from 2010/2/10
    From: FRANCE
    @walkero if you have any problem with SDL2 and liteXL, tell me, i know a bug when SDL2 app use Lock function, i can send you my SDL/SDK fixed. (app crash when exit for example)
    IMac G5 2.1,PowerBook G4 1.5,MacMini 1.5, PowerMac G5 2.7 died !!!
    My MOS ports
  • »24.02.24 - 17:00
    Profile Visit Website
  • Butterfly
    Butterfly
    walkero
    Posts: 99 from 2006/3/1
    @beworld
    I have some crash errors in the logger on exit, yes. I was looking into that because I was convinced that it might be my crapy code, but didn't get anywhere yet.

    I'd love to test your fixes. Thank you.
  • »27.02.24 - 08:32
    Profile Visit Website