Hollywood binaries working faster on G4 than on G5?
  • Paladin of the Pegasos
    Paladin of the Pegasos
    koszer
    Posts: 1302 from 2004/2/8
    From: Poland
    Hi.

    Out of curiosity, I created a tiny "benchmark" program that's calculating an approximation of the Pi number using the Monte Carlo method. The result is not particularly accurate (due to only 1000000 points in the code and the fact that the RndF() function in Hollywood does not generate number 1). However, it does what it's supposed to do - it overloads the computer with nonsense calculations and gives the result in seconds at the end.

    To the point - somehow it's noticeably faster on the Mini. The G5 calculates the whole thing in about 24 seconds, while the G4 takes about 14 seconds. This is a huge difference, given the relative performance difference of these hardware.

    I have a tiny request to all of you - could someone equipped with a computer with a G4 and/or G5 repeat these calculations? I have a sneaking suspicion that the binary generated in Hollywood is simply optimized for the G4 (presumably Andreas only has a Mac Mini), but I'd still like to make sure before I'll bother him about it.

    EDIT: Oops, I might have uploaded bad binary - that's counting 100 million points. I'll do it again. Sorry.

    [ Edited by koszer 08.03.2025 - 18:46 ]
  • »08.03.25 - 18:19
    Profile
  • Moderator
    Kronos
    Posts: 2400 from 2003/2/24
    Not sure what I should see here.

    Both on the PowerBook and G5 I get a black "Hollywood" Window that eats all CPU without end (well over 1 minute) and simply stops when hitting the close gadget.
  • »08.03.25 - 18:40
    Profile
  • Paladin of the Pegasos
    Paladin of the Pegasos
    koszer
    Posts: 1302 from 2004/2/8
    From: Poland
    Quote:

    Kronos wrote:
    Not sure what I should see here.

    Both on the PowerBook and G5 I get a black "Hollywood" Window that eats all CPU without end (well over 1 minute) and simply stops when hitting the close gadget.


    You're right. I'm sorry about that. I've experimented earlier with more points to count - and I've uploaded not this binary.

    Here's the proper one (that should complete calculations in less than half minute).
  • »08.03.25 - 18:52
    Profile
  • Moderator
    Kronos
    Posts: 2400 from 2003/2/24
    O.k. did you move the numbers the other way now?

    G5 (AGP) 2.3GHz: 3.438s
    PB5.9 1.67GHz: 6.329

    What kind of G5 are you running?
    Are you sure it runs at full clock? Maybe test something else like unpacking a big archive.

    This has been a bug with at least the iMac a few years back (which I noticed when unpacking the beta ISO vs an 1.8GHz G4).
  • »08.03.25 - 19:04
    Profile
  • Paladin of the Pegasos
    Paladin of the Pegasos
    koszer
    Posts: 1302 from 2004/2/8
    From: Poland
    Quote:

    Kronos wrote:
    O.k. did you move the numbers the other way now?

    G5 (AGP) 2.3GHz: 3.438s
    PB5.9 1.67GHz: 6.329

    What kind of G5 are you running?
    Are you sure it runs at full clock? Maybe test something else like unpacking a big archive.

    This has been a bug with at least the iMac a few years back (which I noticed when unpacking the beta ISO vs an 1.8GHz G4).


    Now it's calculating only 1 million points. On my G5 QUAD @ 2,5 GHz I get about 23 seconds while I get about 14 seconds on Mac Mini @ 1,33 GHz.

    No other application running in the background, upgraded to MorphOS 3.19 not so long ago. How odd...
    I'll give it a try from a live CD to rule out software fault.

    EDIT: When I run the binary from "live USB" fresh MorphOS 3.19 I still get 23.158 seconds time.

    [ Edited by koszer 08.03.2025 - 19:24 ]
  • »08.03.25 - 19:18
    Profile
  • Moderator
    Kronos
    Posts: 2400 from 2003/2/24
    Retried twice on the PB with todays's 3.20beta out of RAM: in shell I got 5.9s and 5.8s.

    EFIKA on public 3.18 will do it in 26.7s kinda close to your G5 numbers.

    The PB being faster than the Mini might be down to faster memory (in addition to +340MHz) but then your G5 should beat mine as DDR2 4200 vs DDR(1) 3200.

    So again better check other benchmarks.

    [ Edited by Kronos 08.03.2025 - 19:36 ]
  • »08.03.25 - 19:34
    Profile
  • Order of the Butterfly
    Order of the Butterfly
    Tom01
    Posts: 184 from 2009/9/20
    The Altivec Unit of the G4 is faster than that of the G5. I don't know, if Holywood is using Altivec.
  • »08.03.25 - 19:34
    Profile Visit Website
  • Paladin of the Pegasos
    Paladin of the Pegasos
    koszer
    Posts: 1302 from 2004/2/8
    From: Poland
    Quote:

    Kronos wrote:
    So again better check other benchmarks.


    Well, I just did dnetc benchmark on both machines. Here are the results:


    Code:
    dnetc benchmark:

    Test: OGR-NG

    [ CORE ][ G4 result ][ G5 result ]
    [KOGE 3.1 Scalar:][13,662,996 nodes/sec][22,046,482 nodes/sec]
    [KOGE 3.1 Hybrid:][28,819,700 nodes/sec][41,025,997 nodes/sec]

    Test: RC5-72

    [ CORE ][ G4 result ][ G5 result ]
    [ MH 2-pipe ][ 5,185,536 keys/sec ][ 5,606,657 keys/sec ]
    [ KKS 2-pipe ][ 5,094,095 keys/sec ][ 6,135,656 keys/sec ]
    [ KKS 604e ][ 5,221,290 keys/sec ][ 5,599,851 keys/sec ]
    [ KKS 7400 ][12,125,199 keys/sec ][15,746,582 keys/sec ]
    [ KKS 7450 ][14,166,525 keys/sec ][18,839,114 keys/sec ]
    [ MH-1 pipe ][ 4,731,531 keys/sec ][ 4,738,733 keys/sec ]
    [ MH 1-pipe 604e][ 4,689,936 keys/sec ][ 4,742,533 keys/sec ]


    The G5 is clearly faster (not by a huge margin in some cases, but still). What's going on? I'm baffled.
  • »08.03.25 - 20:02
    Profile
  • Moderator
    Kronos
    Posts: 2400 from 2003/2/24
    The iMac gets 23s which is inline with your Quad.

    It kinda is an PCIe based PowerMac in a different package while the older unsupported 1st and 2nd generations would be releated to AGP G5s.
  • »08.03.25 - 21:34
    Profile
  • Paladin of the Pegasos
    Paladin of the Pegasos
    koszer
    Posts: 1302 from 2004/2/8
    From: Poland
    Quote:

    Kronos wrote:
    The iMac gets 23s which is inline with your Quad.

    It kinda is an PCIe based PowerMac in a different package while the older unsupported 1st and 2nd generations would be releated to AGP G5s.


    Meanwhile zukow tried this binary on his PCIe G5@2,3 GHz and he got a 3.786s result.
    Weirder and weirder...
  • »08.03.25 - 21:52
    Profile
  • K-L
  • Cocoon
    Cocoon
    K-L
    Posts: 46 from 2020/11/17
    From: Lyon, France
    PoweMac G5 2,7 Ghz : 24.582

    Weird.
    PowerMac G5 2,7 Ghz / Radeon 9650 / MorphOS 3.15
    AmigaOne X1000 (unused ATM)
  • »09.03.25 - 07:02
    Profile
  • Paladin of the Pegasos
    Paladin of the Pegasos
    koszer
    Posts: 1302 from 2004/2/8
    From: Poland
    Quote:

    K-L wrote:
    PoweMac G5 2,7 Ghz : 24.582

    Weird.


    Weird indeed.
    What do we know so far?
    Code:
     -= computer =-  |CPU clock|result|CPU model|
    Efika 5200B 0,39 GHz: 26,70s MPC5200B |
    Mac Mini G4 1,33 GHz: 14,02s PPC7447A |
    PowerBook G4 1,67 GHz: 6,33s PPC7447A |
    iMac G5 2,1 GHz: 23,00s PPC970fx |
    Power Mac G5 AGP 2,3 GHz: 3,44s PPC970fx |
    Power Mac G5 PCIe 2,3 GHz: 3,78s PPC970MP |
    Power Mac G5 PCIe 2,5 GHz: 23,40s PPC970MP |
    Power Mac G5 AGP 2,7 GHz: 24,58s PPC970fx |


    This just doesn't add up. Why PPC970fx used in one machine would give similar result to 970MP used in another one, but so different from 970fx in yet another one?
  • »09.03.25 - 09:19
    Profile
  • Caterpillar
    Caterpillar
    Posts: 24 from 2022/8/17
    I can confirm the results on Power Mac G5 PCIe 2,5 GHz (PPC970MP): 23,36s. I can test on another G5, this time AGP 2,3 GHz but first I must dust it off and update it to latest MOS.
  • »09.03.25 - 17:51
    Profile
  • Paladin of the Pegasos
    Paladin of the Pegasos
    koszer
    Posts: 1302 from 2004/2/8
    From: Poland
    Quote:

    DayWalker wrote:
    I can confirm the results on Power Mac G5 PCIe 2,5 GHz (PPC970MP): 23,36s. I can test on another G5, this time AGP 2,3 GHz but first I must dust it off and update it to latest MOS.


    Thank you very much. At least I know my computer isn't broken (or I'm not the only one having a broken computer, heh).

    I also got one result from A1139 (17 inch PowerBook G4 @ 1,67 GHz) and it's 13,67s. It's wild, as we've already got one 1,67 PowerBook G4 (but A1138 model as I believe) and Kronos reports it achieved 6,33s.
  • »09.03.25 - 21:30
    Profile
  • Priest of the Order of the Butterfly
    Priest of the Order of the Butterfly
    Stevo
    Posts: 902 from 2004/1/24
    From: #AmigaZeux
    For what it's worth, 20.039s here on a Powerbook G4 15" 1,67ghz (A1138)...
    ---
    http://www.iki.fi/sintonen/logs/its_only_football.txt
  • »09.03.25 - 22:33
    Profile
  • Paladin of the Pegasos
    Paladin of the Pegasos
    koszer
    Posts: 1302 from 2004/2/8
    From: Poland
    Quote:

    Stevo wrote:
    For what it's worth, 20.039s here on a Powerbook G4 15" 1,67ghz (A1138)...



    That's almost on par with slowest G5 results. No G4 has ever reported over 20 seconds.
    What puzzles me is the fact that despite being quite random, these results seem strangely consistent on a specific machine - i.e. my G5 Quad gives me around 23 seconds no matter what tricks I use. I'll try disabling the CPU caches today.
  • »10.03.25 - 06:16
    Profile
  • jPV
  • Yokemate of Keyboards
    Yokemate of Keyboards
    jPV
    Posts: 2140 from 2003/2/24
    From: po-RNO
    I made some tests on Mac mini 1.5GHz and PB 1.67GHz (5,9 - A1139), and noticed quite some difference between OS versions on the very same machine, which may explain some results.

    Mini with 3.19: 13.56s
    Mini with 3.20: 5.23s
    PB with 3.19: 13.72s
    PB with 3.20: 5.88s

    Funny that Mini is slightly faster than PB, but quite similar anyway. I'll try to test with some G5 machine(s) later in the evening or so.

    BTW. can you make the test without using any Rnd functions, using other kind of calculations? Or share the source code for experiments?
  • »10.03.25 - 09:46
    Profile Visit Website
  • Paladin of the Pegasos
    Paladin of the Pegasos
    koszer
    Posts: 1302 from 2004/2/8
    From: Poland
    Quote:

    jPV wrote:

    Mini with 3.19: 13.56s
    Mini with 3.20: 5.23s
    PB with 3.19: 13.72s
    PB with 3.20: 5.88s



    Interesting. Some bugfix maybe?

    Quote:


    BTW. can you make the test without using any Rnd functions, using other kind of calculations? Or share the source code for experiments?



    As a matter of fact I did try some other calculations (pure adding in one instance) and Mini still won with G5 (not by such margin as in our disputed case, but nevertheless).
    I even wrote a simple program drawing 200000 red boxes all over the screen. Still the G4 won with G5 (11 seconds vs 13 seconds). The only upside was - the executable generated for Windows did much worse, scoring well over 1 minute on my machine.

    As for the code - nothing to be proud of, really. Here you go:


    counter = 0

    max_counter = 1000000

    inside = 0

    StartTimer(1)

    For counter = 0 To max_counter

    x = RndF()
    y = RndF()

    If (x^2+y^2) < 1 Then inside = inside+1

    Next

    epi = 4 * inside / max_counter

    countingTime = GetTimer(1)

    countingTime = countingTime / 1000

    Print("przyblizona wartosc Pi to " .. epi .. ", obliczenia zajely " .. countingTime .. " sekund.")

    WaitLeftMouse

    End
  • »10.03.25 - 10:41
    Profile
  • Caterpillar
    Caterpillar
    ChrisC
    Posts: 23 from 2025/3/1
    Do the users ever get beta access? I assume not since i have never seen any betas of the OS available to us mere users.

    [ Edited by ChrisC 10.03.2025 - 11:30 ]
    Power Mac G5 11,2 Dual 2GHz
    ATI X1950 Pro 256MB
    2GB RAM
    500GB Storage
    MorphOS 3.19 (Licenced)
  • »10.03.25 - 12:27
    Profile
  • Paladin of the Pegasos
    Paladin of the Pegasos
    koszer
    Posts: 1302 from 2004/2/8
    From: Poland
    Quote:

    ChrisC wrote:
    Do the users ever get beta access?


    No, only system developers and beta-testers.

    I've tried to determine if cpu cache has something to do with the results.
    I typed CPU NOCACHE in shell and ran the tests again.
    The results were identical as reported previously.
  • »10.03.25 - 13:59
    Profile
  • Moderator
    Kronos
    Posts: 2400 from 2003/2/24
    The iMac was still on a late 3.19 beta and after updating to 3.20 beta I get 4.38s.

    I guess I have to get around to putting 3.19 onto the EFIKA to see if that "competitive" results stays.....
  • »10.03.25 - 15:08
    Profile
  • Paladin of the Pegasos
    Paladin of the Pegasos
    koszer
    Posts: 1302 from 2004/2/8
    From: Poland
    Quote:

    Kronos wrote:
    The iMac was still on a late 3.19 beta and after updating to 3.20 beta I get 4.38s.



    You got 4,38s on the same machine that earlier gave you 23s by updating the operating system?
    Now that's what I call an update!
  • »10.03.25 - 15:18
    Profile
  • Caterpillar
    Caterpillar
    Posts: 24 from 2022/8/17
    So ...

    I have dusted off my PMac G5 AGP 7,2 with 2x2,3 GHz PPC970fx, updated it from MOS 3.18 to 3.19 and ran the test: 24.66s. This is basically in-line with the PCIe ones since PPC970MP at 2,5 GHz is a bit faster.

    Quote:

    DayWalker wrote:
    I can confirm the results on Power Mac G5 PCIe 2,5 GHz (PPC970MP): 23,36s. I can test on another G5, this time AGP 2,3 GHz but first I must dust it off and update it to latest MOS.
  • »10.03.25 - 15:51
    Profile
  • jPV
  • Yokemate of Keyboards
    Yokemate of Keyboards
    jPV
    Posts: 2140 from 2003/2/24
    From: po-RNO
    I made some further tests now.

    First your Hollywood code can be optimized to much faster, and when doing that, results are more in line with the CPU performance.

    I made a quick copypaste with your code, and it produces three lines now:
    1) Your original code
    2) Disabled Hollywood from doing some housekeeping in the heavy calculation loop (DisableLineHook)
    3) Changed frequently used variables in the loop to Local instead of Global, which is always a good idea

    I also ran the tests on PowerMac G5 2.7GHz. Here are the results:

    Mini 3.19
    przyblizona wartosc Pi to 3.141932, obliczenia zajely 13.537 sekund.
    przyblizona wartosc Pi to 3.139372, obliczenia zajely 2.316 sekund.
    przyblizona wartosc Pi to 3.144432, obliczenia zajely 1.425 sekund.

    Mini 3.20
    przyblizona wartosc Pi to 3.14024, obliczenia zajely 5.221 sekund.
    przyblizona wartosc Pi to 3.1414, obliczenia zajely 2.296 sekund.
    przyblizona wartosc Pi to 3.140884, obliczenia zajely 1.415 sekund.

    PB 3.19
    przyblizona wartosc Pi to 3.143672, obliczenia zajely 13.321 sekund.
    przyblizona wartosc Pi to 3.141008, obliczenia zajely 2.163 sekund.
    przyblizona wartosc Pi to 3.141592, obliczenia zajely 1.349 sekund.

    PB 3.20
    przyblizona wartosc Pi to 3.14244, obliczenia zajely 5.732 sekund.
    przyblizona wartosc Pi to 3.142288, obliczenia zajely 2.194 sekund.
    przyblizona wartosc Pi to 3.142924, obliczenia zajely 1.288 sekund.

    G5 3.19
    przyblizona wartosc Pi to 3.141284, obliczenia zajely 24.216 sekund.
    przyblizona wartosc Pi to 3.141228, obliczenia zajely 1.283 sekund.
    przyblizona wartosc Pi to 3.140952, obliczenia zajely 0.726 sekund.

    G5 3.20
    przyblizona wartosc Pi to 3.141824, obliczenia zajely 2.869 sekund.
    przyblizona wartosc Pi to 3.141004, obliczenia zajely 1.258 sekund.
    przyblizona wartosc Pi to 3.14116, obliczenia zajely 0.707 sekund.


    So, it's basically only that unoptimized code on older MorphOS that is extra slow on G5 for some reason...

    Here's my executable (prints to shell): ObliczPI3
    And the code: ObliczPI3.hws.txt
  • »10.03.25 - 16:13
    Profile Visit Website