MorphOS development & UTF-8
  • Leo
  • Order of the Butterfly
    Order of the Butterfly
    Leo
    Posts: 417 from 2003/8/18
    I tried to compile a simple hello world which prints UTF-8 characters using printf():

    Code:

    #include <stdio.h>

    int main(void) {
    char ver[] = "©«",
    ver2[] = "的",
    ver3[] = "$VER: test caractères: ©«的";

    printf("ver: %sn", ver);
    printf("ver2: %sn", ver2);
    printf("-------n");
    printf("的n");
    printf("的n");
    return 0;
    }


    The source file is saved as UTF-8, and I compiled it using gcc file.c -noixemul.

    When I run it in a MorhOS shell, here is what I get instead:

    Code:

    ver: ©«
    ver2: ç

    ...


    Is UTF-8 supported in MorphOS? What encoding should be used to save files with special characters?

    I checked and didn't find any way to specify the encoding used in the Shell or in the OS.

    The same code compiled on Linux or macOS works as expected when executed.

    [ Edited by Leo 08.07.2018 - 11:04 ]
    Nothing hurts a project more than developers not taking the time to let their community know what is going on.
  • »06.07.18 - 09:47
    Profile Visit Website
  • Priest of the Order of the Butterfly
    Priest of the Order of the Butterfly
    Tcheko
    Posts: 518 from 2003/2/25
    From: France
    Almost everything is running with ISO-8859-X or other 8 bit encoding for russian for example.

    You have to handle character encoding yourself before outputing anything to the shell. There is the charsets.library that offers facilities for handling conversion between encoding.
    Quelque soit le chemin que tu prendras dans la vie, sache que tu auras des ampoules aux pieds.
    -------
    I need to practice my Kung Fu.
  • »06.07.18 - 10:17
    Profile Visit Website
  • Acolyte of the Butterfly
    Acolyte of the Butterfly
    Jeckel
    Posts: 133 from 2007/3/11
    Linux is using UTF-8 system-wide as default charset, so you can directly print UTF-8 chars but you will have to do charset conversions for anything else.

    MorphOS (like AmigaOS) is using an 8-bit only charset (it is given by the keyboard preferences). If you want to print UTF-8 chars you will have to use the charset.library to convert from UTF-8 to "system" encoding (of course not all UTF-8 chars can be converted to an 8-bit charset).
  • »06.07.18 - 10:49
    Profile
  • MorphOS Developer
    jacadcaps
    Posts: 3020 from 2003/3/5
    From: Canada
    Unless you've changed the SDK defaults, you'd be compiling this with gcc2.95.3 - try with gcc5 or 6.
  • »06.07.18 - 12:30
    Profile Visit Website
  • Leo
  • Order of the Butterfly
    Order of the Butterfly
    Leo
    Posts: 417 from 2003/8/18
    Quote:

    jacadcaps wrote:
    Unless you've changed the SDK defaults, you'd be compiling this with gcc2.95.3 - try with gcc5 or 6.


    I am using cross-compilation and if I remember correctly it's something like gcc5. Does it change anything regarding encoding?

    What's the best practices for MorphOS (only) CLI apps that need to be localized: do people use charset library or do they use ISO?

    What happens with locale catalogs: what should catalogs .cd files be encoded in?
    Nothing hurts a project more than developers not taking the time to let their community know what is going on.
  • »06.07.18 - 13:13
    Profile Visit Website
  • MorphOS Developer
    jacadcaps
    Posts: 3020 from 2003/3/5
    From: Canada
    Quote:

    Leo wrote:
    I am using cross-compilation and if I remember correctly it's something like gcc5. Does it change anything regarding encoding?



    No, it does not change anything. Your app should come with English and ISO-8859-1 for the builtin defaults. Or you can use ObjectiveC and then UTF-8 comes for free.

    Quote:

    What's the best practices for MorphOS (only) CLI apps that need to be localized: do people use charset library or do they use ISO?


    English/ISO-8859-1 for the built-in language and locale/catalogs for translations.

    Quote:

    What happens with locale catalogs: what should catalogs .cd files be encoded in?


    The catalogs need to be encoded in ISO-8859-1 and .ct files encoded in the default ISO-8859-* associated with the catalog's language.

    Edit: I've misread your original post. You may not output UTF-8 text straight into the shell - that is not supported for compatibility reasons. In that case, either use charsets or Objective-C to output text in the native ISO codepage.

    [ Edited by jacadcaps 06.07.2018 - 08:22 ]
  • »06.07.18 - 13:19
    Profile Visit Website
  • Leo
  • Order of the Butterfly
    Order of the Butterfly
    Leo
    Posts: 417 from 2003/8/18
    Quote:

    jacadcaps wrote:
    Quote:

    Leo wrote:
    I am using cross-compilation and if I remember correctly it's something like gcc5. Does it change anything regarding encoding?



    No, it does not change anything. Your app should come with English and ISO-8859-1 for the builtin defaults. Or you can use ObjectiveC and then UTF-8 comes for free.

    Quote:

    What's the best practices for MorphOS (only) CLI apps that need to be localized: do people use charset library or do they use ISO?


    English/ISO-8859-1 for the built-in language and locale/catalogs for translations.

    Quote:

    What happens with locale catalogs: what should catalogs .cd files be encoded in?


    The catalogs need to be encoded in ISO-8859-1 and .ct files encoded in the default ISO-8859-* associated with the catalog's language.

    Edit: I've misread your original post. You may not output UTF-8 text straight into the shell - that is not supported for compatibility reasons. In that case, either use charsets or Objective-C to output text in the native ISO codepage.

    Ok, that's what I thought.

    Where can I find some documentation on how to convert an UTF8 string to locale codepage using charsets.library?
    Nothing hurts a project more than developers not taking the time to let their community know what is going on.
  • »06.07.18 - 14:07
    Profile Visit Website
  • Acolyte of the Butterfly
    Acolyte of the Butterfly
    Jeckel
    Posts: 133 from 2007/3/11
    Autodocs of charsets.library are self-explanatory. :)
  • »06.07.18 - 14:36
    Profile
  • Leo
  • Order of the Butterfly
    Order of the Butterfly
    Leo
    Posts: 417 from 2003/8/18
    Thank you for the answers. After some thoughts I'll go with the traditional way of doing things and will use ISO8859-1 as there is no good reason to use UTF8.

    I guess directly supporting UTF8 in the whole OS is not planned (because not really possible without loosing compatibility)?

    [ Edited by Leo 08.07.2018 - 11:05 ]
    Nothing hurts a project more than developers not taking the time to let their community know what is going on.
  • »08.07.18 - 12:03
    Profile Visit Website
  • Priest of the Order of the Butterfly
    Priest of the Order of the Butterfly
    Tcheko
    Posts: 518 from 2003/2/25
    From: France
    Quote:

    Leo wrote:
    Thank you for the answers. After some thoughts I'll go with the traditional way of doing things and will use ISO8859-1 as there is no good reason to use UTF8.

    I guess directly supporting UTF8 in the whole OS is not planned (because not really possible without loosing compatibility)?


    Nothing is wired to support multibyte encoding in the operating system for legacy library calls. For example, all text related functions in graphics.library do not have a hint about text encoding.
    Quelque soit le chemin que tu prendras dans la vie, sache que tu auras des ampoules aux pieds.
    -------
    I need to practice my Kung Fu.
  • »08.07.18 - 23:42
    Profile Visit Website