utf-8 support in MorphOS?
  • Order of the Butterfly
    Order of the Butterfly
    MorphDelf
    Posts: 274 from 2004/2/20
    From: Oslo, Norway
    I am just wondering, if it is possible to add utf-8 support to MorphOS? Or is it difficult?

    Could MorphOS easily support Japaneese characters etc?



    Would be nice to get some knowledge on these topics.

    Regards,
    Michal
  • »05.03.06 - 16:34
    Profile Visit Website
  • Order of the Butterfly
    Order of the Butterfly
    AmigaMancer
    Posts: 265 from 2005/8/25
    Hi, i don't have knowledge on this issue but utf-8 is invaluable to me, as i use non-latin alphabets a lot.
    I don't know how hard can this be, but my guess is that it can't be TOO hard: http://main.aminet.net/util/libs/codesets.lha
    Amiga 1200 user.
  • »05.03.06 - 16:50
    Profile
  • Order of the Butterfly
    Order of the Butterfly
    GK_LKA
    Posts: 481 from 2004/3/28
    From: Hungary
    @AmigaMancer:

    ???
    The main adventage of UTF-8 is, that it handles all(?) symbols and letters of all the big languages: latin, cyrillic, greek, japanese, etc. If you use UTF-8, you can write e.g. a japanese quote into a russian text.

    Codesets library is only a temporal solution, it converts some charsets (incl. UTF-8) into Amiga standard 8 bit ones.
    [ GK / LKA Team ]
  • »05.03.06 - 22:07
    Profile Visit Website
  • Order of the Butterfly
    Order of the Butterfly
    MorphDelf
    Posts: 274 from 2004/2/20
    From: Oslo, Norway
    Nice! Thanks for all which repplied.
  • »05.03.06 - 22:33
    Profile Visit Website
  • Order of the Butterfly
    Order of the Butterfly
    Neko
    Posts: 301 from 2003/2/24
    From: Genesi
    Easy ways to get UTF-8 support into MorphOS

    1) hack input.device to return UTF8 codes (and multiple keystrokes) for >7bit codes

    2) Patch Text*() functions in order to properly handle UTF8 scanning and length etc.

    diskfont and bullet API already provide for Unicode codepoints in fonts (and ft2 and so on already handle it very well).

    Any app which renders using system functions and saves untransformed ASCII text will work okay, then.
    Matt Sealey, Genesi USA, Inc.
    Developer Relations
    Product Development Analyst
  • »06.03.06 - 07:42
    Profile Visit Website
  • Order of the Butterfly
    Order of the Butterfly
    koan
    Posts: 303 from 2005/11/21
    From: UK
    @Neko

    I don't think it is easy or sensible to make a "quick hack" to change from single to multibyte characters.

    How can you tell the difference between combinations of "extended ASCII" characters and a UTF-8 coded triplet ? You can't unless you have prior knowledge.

    Far better (but much more work) would be to make all MOS internals use Unicode.

    koan
  • »06.03.06 - 08:17
    Profile
  • Order of the Butterfly
    Order of the Butterfly
    Hawk
    Posts: 204 from 2003/12/29
    From: Tokyo - Japan
    There should be a bounty for this.

    JFKK is nice, but still I can't read most of my mail (gmail converts any Japanese encoding to UTF-8). OS4 got UTF-8, right? I read something about it in the new IBrowse release.

    About input systems, both JFKK-input and Charabia are a nightmare. I would like something like uim-anthy from Linux/GTK world.
    Pegasos II G3@600Mhz (no fan) 512MB RAM (1 slot)
    -- Maxtor 6Y120P0 120GB, 7200 rpm -- ATI Radeon 7500 - (64MB, TV-out)
    -- Minuet Slimline PC case -- MorphOS 1.4.5 + Gentoo
    EFIKA
  • »07.03.06 - 07:22
    Profile Visit Website
  • Order of the Butterfly
    Order of the Butterfly
    GK_LKA
    Posts: 481 from 2004/3/28
    From: Hungary
    I don't think we really need a bounty for this... It's the job of the MOS core writers.
    [ GK / LKA Team ]
  • »07.03.06 - 07:47
    Profile Visit Website
  • Order of the Butterfly
    Order of the Butterfly
    koan
    Posts: 303 from 2005/11/21
    From: UK
    @Hawk

    I don't know Gmail but they probably put a MIME tag in the head of the web page.

    My paid for mail provider has webmail but the dimwits put a MIME tag with ISO-8859-1 on all email you view with that method.

    Try forwarding the mail to another account that doesn't do this.
  • »07.03.06 - 08:20
    Profile
  • Order of the Butterfly
    Order of the Butterfly
    Neko
    Posts: 301 from 2003/2/24
    From: Genesi
    It wouldn't be a blanket quick hack.

    For every application that supports text input - MUI gadgets and so on - have a "UTF8 Handling" flag attached to it. You can set this if you want it to mess with UTF8 text natively in an application. For current versions, it would be disabled. Apps would have to be recompiled (or patched) to enable it.

    You could force that handling by putting UTF8 byte markers at the top of the string - this is a very uncommon byte sequence and not used in ASCII text. Text*() should run UTF8 functionality over text with the byte marker at the beginning of the string, and ASCII otherwise, UNLESS forced with a flag. MUI gadgets and text processing would be the same.

    For applications that are happy to be "coerced", there could be a system database of friendly applications. In Windows there is a "extend advance type services to all applications" checkbox. You can also disable advanced text services for individual apps on most operating systems. This would be easy to do in MUI apps (use the MUI Window ID as the key).

    In fact it would be better simply to add the functionality to MUI and forget the rest of the OS. It would make everything a hell of a lot easier for application programmers.

    I did a LOT of research on this subject but threw it away somewhere because the MorphOS guys don't give a shit about Unicode support. I tried 3 different unicode library solutions, settled on a highly advanced one, and got no help in coercing it to work efficiently in MorphOS - as usual. It needs tight integration into the OS than a 3rd party developer can do.
    Matt Sealey, Genesi USA, Inc.
    Developer Relations
    Product Development Analyst
  • »05.04.06 - 14:32
    Profile Visit Website
  • MorphOS Developer
    CISC
    Posts: 619 from 2005/8/27
    From: the land with ...
    Quote:

    I did a LOT of research on this subject but threw it away somewhere because the MorphOS guys don't give a flowers about Unicode support.


    It's not that we didn't care, it's just that this takes alot of work to get right, and no, you can't just rely on BOM to trigger UTF8 handling, although this might be an uncommon sequence, it's not unlikely, esp. if you take into account that the text-sequence can be any codepage...

    Simply put, to have unicode the app must know what it's doing, you can't force unicode on unknowing ones.

    Additionally, to have proper unicode support in the system as a whole there has to be substancial changes done to the way fonts are handled (something which will slow things down alot as well), right now fonts are only dealt with as bitmaps in a fixed codepage, this obviously won't work with unicode, and you have to render glyphs OTF (this is where slow comes in), albeit you can do this already today (as previously mentioned) through ft2's bullet API, though it's quite awkward...

    Quote:

    I tried 3 different unicode library solutions, settled on a highly advanced one, and got no help in coercing it to work efficiently in MorphOS - as usual.


    Well, as usual, this was pretty much your own fault, but let's leave it at that.


    - CISC
  • »05.04.06 - 15:02
    Profile
  • Order of the Butterfly
    Order of the Butterfly
    Hawk
    Posts: 204 from 2003/12/29
    From: Tokyo - Japan
    I'm just curious,
    is current MorphOS gcc (3.4?), UTF-8 ready?

    Will we need a new gcc if an UTF8 was ready? (I don't give up on the thought ;)
    Pegasos II G3@600Mhz (no fan) 512MB RAM (1 slot)
    -- Maxtor 6Y120P0 120GB, 7200 rpm -- ATI Radeon 7500 - (64MB, TV-out)
    -- Minuet Slimline PC case -- MorphOS 1.4.5 + Gentoo
    EFIKA
  • »06.04.06 - 12:06
    Profile Visit Website
  • MorphOS Developer
    CISC
    Posts: 619 from 2005/8/27
    From: the land with ...
    Huh? What the hades are you on about?


    - CISC
  • »06.04.06 - 13:04
    Profile
  • Order of the Butterfly
    Order of the Butterfly
    Bladerunner
    Posts: 418 from 2004/2/19
    Quote:

    Huh? What the hades are you on about?


    And why the hell have you to bring in hades, my precious little file and webserver? *lol* (sorry just kidding, but I really couldn`t resist ;) )
  • »06.04.06 - 13:34
    Profile
  • Order of the Butterfly
    Order of the Butterfly
    merko
    Posts: 328 from 2003/5/19
    CISC: I agree it's clear that apps must tell the system that they want
    unicode. Then there must be some sort of advanced caching system to
    make sure that common glyphs are not re-rendered all the time.

    bladerunner: Probably because MZ censors certain evil dwellings but
    not others. Must be very awkward if you happen to live in a certain
    Norwegian village.
  • »06.04.06 - 13:40
    Profile
  • Acolyte of the Butterfly
    Acolyte of the Butterfly
    C64Days
    Posts: 103 from 2006/2/27
    From: Italy
    I remember using Ucode by Rev. Ken Shillito on my A1200:

    http://it.aminet.net/pub/aminet/text/show/Ucode.readme

    too bad it was entirely in 68k asm and that the project
    saw no further updates... it correctly supported all the texts i
    tried it with (japanese, korean, chinese, russian, greek....).
  • »06.04.06 - 15:04
    Profile