Terrible blit mask performance (now solved!)
  • Order of the Butterfly
    Order of the Butterfly
    ChrisH
    Posts: 167 from 2009/11/26
    I am finding that blitting with a mask varies from fast to horribly slow on an Efika with a Radeon, and I'd appreciate if some people could please run the following test to confirm it (on different hardware) :
    http://cshandley.co.uk/temp/MOS_MaskSpeedTest.lha

    Just reboot, then unpack it somewhere, then run MOS_MaskSpeedTest, and do nothing until the new window disappears (may take seconds or minutes). Then please post your results. Here are mine:

    Quote:

    Test using a 640 by 480 window, bitmap & mask.

    BltBitMapRastPort() took 1522 uS

    With a clear mask:
    BltMaskBitMapRastPort() took 12442 uS (8 times slower than no mask)

    With a set mask:
    BltMaskBitMapRastPort() took 280483 uS (184 times slower than no mask)

    With an alternating-line mask:
    BltMaskBitMapRastPort() took 148256 uS (97 times slower than no mask)

    With a chequered mask:
    BltMaskBitMapRastPort() took 146042 uS (96 times slower than no mask)

    With an alternating-thick-line mask:
    BltMaskBitMapRastPort() took 147892 uS (97 times slower than no mask)


    The first two results are about as expected; BltMaskBitMapRastPort() is around 10 times slower than BltBitMapRastPort(). This also happens on OS4, although on emulated VESA AROS it is no slower.

    But the remaining results are strange & pretty bad; when the mask contains some transparent pixels, it can take up to 20 times *longer* to blit, meaning up to 200 times worse than without a mask! Such a masked blit is taking up to 0.3 seconds, which is clearly ridiculous. (OS4 & AROS do not slow down at all for transparent pixels.)

    Interestingly the speed degredation is proportional to the number of transparent pixels, suggesting that the "blit" is CPU dependant. If other peoples' results confirm this, then either MOS has a bug, or I'm doing something wrong. If anyone can suggest my mistake, that would be great.

    The source code is included, but you can also read it here:
    http://cshandley.co.uk/temp/MOS_MaskSpeedTest.cpp


    [ Edited by ChrisH 24.10.2011 - 22:23 ]
    Author of the PortablE programming language.
    It is pitch black. You are likely to be eaten by a grue...
  • »22.10.11 - 15:15
    Profile Visit Website
  • Yokemate of Keyboards
    Yokemate of Keyboards
    Andreas_Wolf
    Posts: 12079 from 2003/5/22
    From: Germany
    > Just reboot, then unpack it somewhere, then run MOS_MaskSpeedTest, and
    > do nothing until the new window disappears (may take seconds or minutes).
    > Then please post your results. Here are mine: [...]

    Mine (Mac mini, MorphOS 2.7):

    Code:
    Test using a 640 by 480 window, bitmap & mask.

    BltBitMapRastPort() took 1487 uS

    With a clear mask:
    BltMaskBitMapRastPort() took 1810 uS (1 times slower than no mask)

    With a set mask:
    BltMaskBitMapRastPort() took 155247 uS (104 times slower than no mask)

    With an alternating-line mask:
    BltMaskBitMapRastPort() took 80303 uS (54 times slower than no mask)

    With a chequered mask:
    BltMaskBitMapRastPort() took 87061 uS (59 times slower than no mask)

    With an alternating-thick-line mask:
    BltMaskBitMapRastPort() took 80292 uS (54 times slower than no mask)


    > Interestingly the speed degredation is proportional to the number of
    > transparent pixels, suggesting that the "blit" is CPU dependant.

    CPU is clearly running at maximum during execution of your speed test.
  • »22.10.11 - 15:46
    Profile
  • Order of the Butterfly
    Order of the Butterfly
    ChrisH
    Posts: 167 from 2009/11/26
    @Andreas_Wolf
    Thanks. Not exactly the same as me, but it does show that masked blits with transparent pixels are up to 100 times slower than without such pixels (or mask), and slow-down is also proportional to the number of transparent pixels.
    Author of the PortablE programming language.
    It is pitch black. You are likely to be eaten by a grue...
  • »22.10.11 - 15:52
    Profile Visit Website
  • Order of the Butterfly
    Order of the Butterfly
    Motosampy
    Posts: 199 from 2004/8/14
    From: Järvenp&a...
    Test using a 640 by 480 window, bitmap & mask.

    BltBitMapRastPort() took 565 uS

    With a clear mask:
    BltMaskBitMapRastPort() took 2868 uS (5 times slower than no mask)

    With a set mask:
    BltMaskBitMapRastPort() took 153361 uS (271 times slower than no mask)

    With an alternating-line mask:
    BltMaskBitMapRastPort() took 79492 uS (141 times slower than no mask)

    With a chequered mask:
    BltMaskBitMapRastPort() took 87351 uS (154 times slower than no mask)

    With an alternating-thick-line mask:
    BltMaskBitMapRastPort() took 79446 uS (140 times slower than no mask)


    PowerMac MDD G4 @ 867MHz
  • »22.10.11 - 16:27
    Profile
  • MorphOS Developer
    Henes
    Posts: 507 from 2003/6/14
    @ChrisH

    Change AllocBitMap()'s friend bitmap to be NULL and your blits wil be around 100 times faster.
    Reading from video memory using CPU is ultra slow.


    edit: on a second thought, maybe just clearing the BFM_DISPLAYABLE flag could be a better idea.

    [ Edited by Henes 22.10.2011 - 18:22 ]
  • »22.10.11 - 17:11
    Profile Visit Website
  • MorphOS Developer
    itix
    Posts: 1516 from 2003/2/24
    From: Finland
    Clearing BMF_DISPLAYBLE does it.

    Here are results from my Mac mini with original flags:

    Quote:


    Test using a 640 by 480 window, bitmap & mask.

    BltBitMapRastPort() took 1549 uS

    With a clear mask:
    BltMaskBitMapRastPort() took 2199 uS (1 times slower than no mask)

    With a set mask:
    BltMaskBitMapRastPort() took 155381 uS (100 times slower than no mask)

    With an alternating-line mask:
    BltMaskBitMapRastPort() took 79655 uS (51 times slower than no mask)

    With a chequered mask:
    BltMaskBitMapRastPort() took 85879 uS (55 times slower than no mask)

    With an alternating-thick-line mask:
    BltMaskBitMapRastPort() took 79614 uS (51 times slower than no mask)



    And results with BMF_DISPLAYABLE cleared:

    Quote:


    Test using a 640 by 480 window, bitmap & mask.
    BltBitMapRastPort() took 10244 uS

    With a clear mask:
    BltMaskBitMapRastPort() took 1742 uS (0 times slower than no mask)

    With a set mask:
    BltMaskBitMapRastPort() took 8719 uS (1 times slower than no mask)

    With an alternating-line mask:
    BltMaskBitMapRastPort() took 5075 uS (0 times slower than no mask)

    With a chequered mask:
    BltMaskBitMapRastPort() took 14498 uS (1 times slower than no mask)

    With an alternating-thick-line mask:
    BltMaskBitMapRastPort() took 5235 uS (1 times slower than no mask)

    1 + 1 = 3 with very large values of 1
  • »22.10.11 - 17:52
    Profile
  • Order of the Butterfly
    Order of the Butterfly
    ChrisH
    Posts: 167 from 2009/11/26
    itix,
    Quote:

    Clearing BMF_DISPLAYBLE does it.

    Thanks, that does indeed fix it :-) . This should hopefully mean that 'my' Shadow Of The Beast demo will now run very fast...

    EDIT: Yup, now running very smoothly :)))

    [ Edited by ChrisH 24.10.2011 - 22:24 ]
    Author of the PortablE programming language.
    It is pitch black. You are likely to be eaten by a grue...
  • »24.10.11 - 18:25
    Profile Visit Website