• Butterfly
    Butterfly
    Posts: 80 from 2017/9/10
    Stream Quad G5

    Quote:

    STREAM version $Revision: 5.10 $
    -------------------------------------------------------------
    This system uses 8 bytes per array element.
    -------------------------------------------------------------
    Array size = 10000000 (elements), Offset = 0 (elements)
    Memory per array = 76.3 MiB (= 0.1 GiB).
    Total memory required = 228.9 MiB (= 0.2 GiB).
    Each kernel will be executed 10 times.
    The *best* time for each kernel (excluding the first iteration)
    will be used to compute the reported bandwidth.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 34862 microseconds.
    (= 34862 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Best Rate MB/s Avg time Min time Max time
    Copy: 2926.7 0.054736 0.054670 0.054815
    Scale: 2903.6 0.055172 0.055104 0.055203
    Add: 3359.6 0.071593 0.071437 0.071846
    Triad: 3377.3 0.071144 0.071063 0.071237
    -------------------------------------------------------------
    Solution Validates: avg error less than 1.000000e-13 on all three arrays



    just for compare Stream P5040

    Quote:


    STREAM version $Revision: 5.10 $
    -------------------------------------------------------------
    This system uses 8 bytes per array element.
    -------------------------------------------------------------
    Array size = 10000000 (elements), Offset = 0 (elements)
    Memory per array = 76.3 MiB (= 0.1 GiB).
    Total memory required = 228.9 MiB (= 0.2 GiB).
    Each kernel will be executed 10 times.
    The *best* time for each kernel (excluding the first iteration)
    will be used to compute the reported bandwidth.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 125127 microseconds.
    (= 125127 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Best Rate MB/s Avg time Min time Max time
    Copy: 1328.9 0.120552 0.120398 0.120717
    Scale: 1233.6 0.130414 0.129701 0.134902
    Add: 1615.4 0.149091 0.148566 0.149944
    Triad: 1612.6 0.149026 0.148826 0.149314

  • »15.09.17 - 15:22
    Profile