X1000
  • Order of the Butterfly
    Order of the Butterfly
    minator
    Posts: 370 from 2003/3/28
    Quote:

    ARM does have some features that hold it back (in order again, therefore zero branch prediction, etc).


    Why do you think in-order processors have no branch prediction?
    The ARM10 had it over a decade ago and that was an in-order design.

    Also, in-order does not automatically mean slow. It really depends what you are running. Things like Cell and GPUs are all in-order but they're very fast processors.
    In-order becomes a problem if you are trying to run "control" code at high speed. Things like "data" processing heavy (think video, image, audio processing and even games) get relatively little advantage from out-of-order, so processors focused on data processing only tend to be in-order or at best only mildly out-of-order.

    e.g. The new Freescale parts designed for networking appear to be "mildly" out-of-order, nothing compared to even a G5.
  • »30.06.11 - 19:39
    Profile Visit Website
  • Jim
  • Yokemate of Keyboards
    Yokemate of Keyboards
    Jim
    Posts: 4977 from 2009/1/28
    From: Delaware, USA
    Prophet? No. Cybernetic? Not really.
    But truly your ARE the king of the search engines.

    And yes, I'd never realized how much of a role out of order execution and branch prediction played in modern CPUs until you quoted those figures.

    I really had much higher expectations of the Cell BE's performance before that.

    It also points to one of the reasons most PPCs have an inherent advantage over ARM CPUs.
    "Never attribute to malice what can more readily explained by incompetence"
  • »01.07.11 - 00:05
    Profile
  • Jim
  • Yokemate of Keyboards
    Yokemate of Keyboards
    Jim
    Posts: 4977 from 2009/1/28
    From: Delaware, USA
    >Things like Cell and GPUs are all in-order but they're very fast processors.

    Actually, I used to think that too. Until we spent some months discussing it (and even making some inquiries to IBM).
    You see, a 3.2 Ghz Cell BE IS fast. But how powerful is it?
    You need to look at the benches that Andreas re-posted the link to.
    Per clock cycle, an out of order PPC is much more powerful then the Cell's core processor.
    Trust me, this has been thoroughly hashed over. Many of us had high expectations for the Cell's performance.
    But the PPE core turned out to be somewhat weak and programming the SPEs for maximum efficiency was about as easy as juggling running chainsaws.
    "Never attribute to malice what can more readily explained by incompetence"
  • »01.07.11 - 00:15
    Profile
  • Yokemate of Keyboards
    Yokemate of Keyboards
    Andreas_Wolf
    Posts: 12150 from 2003/5/22
    From: Germany
    > I'd never realized how much of a role out of order execution
    > and branch prediction played in modern CPUs until you quoted
    > those figures.

    The Cell PPE while being an in-order design does have branch prediction. See minator's reply where he says that out-of-order execution capability is not a prerequisite for branch prediction.

    > I really had much higher expectations of the Cell BE's
    > performance before that.

    The Cell's special performance comes from the SPEs, if used properly that is. The benchmark figures I quoted take only the PPE into account, the SPEs are ignored.

    > It also points to one of the reasons most PPCs have an
    > inherent advantage over ARM CPUs.

    As it seems you missed what minator and I tried to tell you: there have been in-order ARM CPUs with branch prediction for at least a decade, and since early 2010 there're also out-of-order ARM CPUs. So what's this reason and advantage you're talking about?
  • »01.07.11 - 00:29
    Profile
  • Order of the Butterfly
    Order of the Butterfly
    minator
    Posts: 370 from 2003/3/28
    [email]Per clock cycle, an out of order PPC is much more powerful then the Cell's core processor.
    Trust me, this has been thoroughly hashed over. Many of us had high expectations for the Cell's performance.[/email]

    That was sort of my point. It's weak(ish) at control code. But you'll find it'll do rather well on AltiVec stuff, probably better than any other PowerPC.

    In reality the PPE's performance was quite varied. I've seen benchmarks for it that showed it to be both out-gunning and out-gunned by a G4.
    It's designed for throughput so you really need to program it directly for that. A lot of existing code wont do that very well.

    Quote:

    But the PPE core turned out to be somewhat weak and programming the SPEs for maximum efficiency was about as easy as juggling running chainsaws.


    That's a load of twaddle.

    Most of the people who say Cell is difficult to program have never actually programmed it.

    I spent a lot of time working with a start-up that was going to build Cell workstations. I read *everything* there was on Cell and there was a lot to read, most of which completely missed by the press. Of the people that actually programmed Cell, pretty much no one said it was difficult, more involved yes but not difficult.

    Later on I got myself a PS3 and programmed it myself. Getting the code to run was complicated but that was because IBM's SDK was badly laid out. The actual coding was copy-paste of some AltiVec code and some added triple buffering. Not exactly rocked science.

    As far as I can tell, the whole "Cell is difficult to program" was a FUD campaign by IBM's competitors.
  • »01.07.11 - 00:48
    Profile Visit Website
  • Yokemate of Keyboards
    Yokemate of Keyboards
    Andreas_Wolf
    Posts: 12150 from 2003/5/22
    From: Germany
    > you'll find it'll do rather well on AltiVec stuff, probably
    > better than any other PowerPC.

    I can imagine that in this regard both the in-order POWER6 and the out-of-order POWER7 surpass the Cell PPE (as well as the Xenon/XCGPU for that matter).

    > the whole "Cell is difficult to program" was a FUD campaign
    > by IBM's competitors.

    I think that also the competitors of IBM's customer/partner Sony, who are IBM customers as well (but for other products), may have had good reasons to take part in such a campaign ;-)
  • »01.07.11 - 01:05
    Profile
  • Jim
  • Yokemate of Keyboards
    Yokemate of Keyboards
    Jim
    Posts: 4977 from 2009/1/28
    From: Delaware, USA
    Actually you've both made good points.
    But I have examined IBM's programming tutorials for maximizing the potential of the SPEs and I'd still say it resembles a juggling act.
    And trying to compare a Cell to a standard PPC becomes difficult as the efficient use/programming of the SPEs is essential to harnessing the Cell's potential.
    At least with GPUs there are some programming tools/standards.
    Are there any software tools to help code for and manage Cell use?
    "Never attribute to malice what can more readily explained by incompetence"
  • »01.07.11 - 01:15
    Profile
  • Yokemate of Keyboards
    Yokemate of Keyboards
    Andreas_Wolf
    Posts: 12150 from 2003/5/22
    From: Germany
    > Are there any software tools to help code for and manage Cell use?

    http://www.alphaworks.ibm.com/keywords/cell
  • »01.07.11 - 02:11
    Profile
  • Jim
  • Yokemate of Keyboards
    Yokemate of Keyboards
    Jim
    Posts: 4977 from 2009/1/28
    From: Delaware, USA
    CellAttach looks interesting, but its for the PowerXCell8i processor.
    The rest of the software listed is pretty much the same stuff they had the last time I looked.
    Nothing that really integrates and coordinates SPE programming with the PPE.

    Have you looked at the programming examples? They're painful.
    "Never attribute to malice what can more readily explained by incompetence"
  • »01.07.11 - 03:02
    Profile
  • Yokemate of Keyboards
    Yokemate of Keyboards
    Andreas_Wolf
    Posts: 12150 from 2003/5/22
    From: Germany
    > Nothing that really integrates and coordinates SPE programming with the PPE.

    I'm not sure I even know what you mean here. I guess if such software tools exist then minator who I regard as an expert in all things Cell can help you better than I ever could.

    > Have you looked at the programming examples?

    I had a quick glance at the IBM articles I referred you to some months ago but my programming experience is too shallow for a meaningful assessment as to whether properly programming the Cell including its SPEs is just "more involved" or really resembles "painful chainsaw juggling".
  • »01.07.11 - 10:11
    Profile
  • Order of the Butterfly
    Order of the Butterfly
    minator
    Posts: 370 from 2003/3/28
    Quote:

    At least with GPUs there are some programming tools/standards.
    Are there any software tools to help code for and manage Cell use?


    The same one: OpenCL
  • »01.07.11 - 19:07
    Profile Visit Website
  • Order of the Butterfly
    Order of the Butterfly
    minator
    Posts: 370 from 2003/3/28
    Quote:

    > you'll find it'll do rather well on AltiVec stuff, probably
    > better than any other PowerPC.

    I can imagine that in this regard both the in-order POWER6 and the out-of-order POWER7 surpass the Cell PPE (as well as the Xenon/XCGPU for that matter).


    They're POWER, not PowerPC ;-)
  • »01.07.11 - 19:09
    Profile Visit Website
  • Yokemate of Keyboards
    Yokemate of Keyboards
    Andreas_Wolf
    Posts: 12150 from 2003/5/22
    From: Germany
    > They're POWER, not PowerPC ;-)

    https://morph.zone/modules/newbb_plus/viewtopic.php?forum=3&topic_id=7289&start=40 ;-)
  • »01.07.11 - 19:51
    Profile
  • Yokemate of Keyboards
    Yokemate of Keyboards
    Andreas_Wolf
    Posts: 12150 from 2003/5/22
    From: Germany
    > The same one: OpenCL

    Interesting. Found this:

    "The framework allows for parallel programming across a number of devices such as CPUs, GPUs, and accelerators (like the Cell/B.E. SPU)."
    http://www.alphaworks.ibm.com/tech/opencl

    Seeing as Jim has expressed interest in OpenCL before this might be exactly what he's looking for.


    Edit: Found a project to create an open source implementation:

    http://sites.google.com/site/openclps3/

    [ Edited by Andreas_Wolf 01.07.2011 - 23:19 ]
  • »01.07.11 - 20:00
    Profile
  • Jim
  • Yokemate of Keyboards
    Yokemate of Keyboards
    Jim
    Posts: 4977 from 2009/1/28
    From: Delaware, USA
    Andreas, I didn't respond immediately to this because I wanted to avoid a protracted argument over opinions.
    We both know the PPE core of the Cell is a little weak, but that will not be a problem with its successors (which will feature out of order execution pipelines).
    While I have given up on the idea of asymmetrical multi-core processors or the SPEs, their value and real world performance potential is debatable. And whether you consider the additional programming considerations necessary to harness them "more invilved" or very significant, it does require consideration.
    This is where Microsoft's use of multiple identical cores on the Xenon may make more sense than the odd combination used in the Cell. Code is uniform requires no more consideration for the architecture then it be threaded.

    I don't mean to slight minator or any of the other Cell "experts" (although I have given you my definition for "expert"). But, while I respect IBM's engineering prowess, I'm not sure they've proven their case on the utility of this approach.
    "Never attribute to malice what can more readily explained by incompetence"
  • »01.07.11 - 20:12
    Profile
  • Jim
  • Yokemate of Keyboards
    Yokemate of Keyboards
    Jim
    Posts: 4977 from 2009/1/28
    From: Delaware, USA
    @ minator,

    >The same one: OpenCL

    Really? I didn't know that and I think that's very cool.
    OK, I was wrong and I'm glad to know that as this is something significant.
    Suddenly CPUs with integrated GPUs and a Power design incorporating SPEs begin to have a common utility that I find VERY appealing.
    Thanks minator.

    God, I love this forum.
    I get fascinating, useful answers and information all the time.

    And yes Andreas, you were the one that pointed out the IBM programming examples that left me questioning the overall complexity of Cell programming requirements.
    I've never denied that you are the best resource on the planet for such links.
    Why do you think I value your input so much?
    "Never attribute to malice what can more readily explained by incompetence"
  • »01.07.11 - 20:21
    Profile
  • Order of the Butterfly
    Order of the Butterfly
    minator
    Posts: 370 from 2003/3/28
    Quote:

    Really? I didn't know that and I think that's very cool.
    OK, I was wrong and I'm glad to know that as this is something significant.
    Suddenly CPUs with integrated GPUs and a Power design incorporating SPEs begin to have a common utility that I find VERY appealing.
    Thanks minator.


    Of course you do know the OpenCL programming model is more complex than Cell....

    Quote:

    This is where Microsoft's use of multiple identical cores on the Xenon may make more sense than the odd combination used in the Cell. Code is uniform requires no more consideration for the architecture then it be threaded.


    The MS approach proved very successful at the beginning of the current console cycle because it was easier to get things up and running.

    However, to get the maximum performance out of it is no easier than Cell. To get maximum performance out of *any* processor requires detailed knowledge of its internals. The tricks you use to get that performance out of Cell are exactly the same tricks you use to get performance out of any processor.

    In fact, it might be more difficult to get maximum performance from Xenon. The SPEs are deterministic, you know exactly what data is in the local memory. You don't know what is in a cache.

    Even if you do manage to get maximum performance out of Xenon, you still only have half the performance of Cell.

    It's a trade-off. Cell is more complex to program (Note: I never said it was simple) but in return for more work you get more performance. The design also has a lot to do with power consumption, there is no out-of-order because it would have required far too much power. The SPEs are very simple (i.e. they are true RISC processors) because any more complexity would make them too big and to hot.

    Would it have been better if they had made PPC SPEs? for the PS3 no. For market acceptance beyond the PS3, almost certainly yes.
  • »01.07.11 - 22:52
    Profile Visit Website
  • Order of the Butterfly
    Order of the Butterfly
    minator
    Posts: 370 from 2003/3/28
    Quote:

    > They're POWER, not PowerPC ;-)

    https://morph.zone/modules/newbb_plus/viewtopic.php?forum=3&topic_id=7289&start=40 ;-)


    I was going by brand, not ISA 8-)
  • »01.07.11 - 22:55
    Profile Visit Website
  • Yokemate of Keyboards
    Yokemate of Keyboards
    Andreas_Wolf
    Posts: 12150 from 2003/5/22
    From: Germany
    > I was going by brand, not ISA

    And here I was foolish enough to think we were having a *technical* discussion. My bad ;-)
  • »02.07.11 - 01:15
    Profile
  • Yokemate of Keyboards
    Yokemate of Keyboards
    Andreas_Wolf
    Posts: 12150 from 2003/5/22
    From: Germany
    > the PPE core of the Cell is a little weak, but that will not be a problem
    > with its successors (which will feature out of order execution pipelines).

    You know that Cell's successors (which ones in particular?) will feature out-of-order execution? How do you know?
  • »02.07.11 - 01:35
    Profile
  • Yokemate of Keyboards
    Yokemate of Keyboards
    Andreas_Wolf
    Posts: 12150 from 2003/5/22
    From: Germany
    > you are the best resource on the planet for such links.

    Planet? Wow. MorphZone would have been kudos enough, thanks ;-)
  • »02.07.11 - 02:06
    Profile
  • Jim
  • Yokemate of Keyboards
    Yokemate of Keyboards
    Jim
    Posts: 4977 from 2009/1/28
    From: Delaware, USA
    >You know that Cell's successors (which ones in particular?) will feature out-of-order execution? How do you know?

    Just an assumption that the next generation of Power processors would feature out of order execution.

    >>you are the best resource on the planet for such links.

    >Planet? Wow. MorphZone would have been kudos enough, thanks ;-)


    Of course the whole planet. This is the world wide web after all.
    "Never attribute to malice what can more readily explained by incompetence"
  • »02.07.11 - 18:14
    Profile
  • Yokemate of Keyboards
    Yokemate of Keyboards
    Andreas_Wolf
    Posts: 12150 from 2003/5/22
    From: Germany
    > Just an assumption that the next generation of Power
    > processors would feature out of order execution.

    Ah okay, so you're referring to POWER8 with SPEs then. I wasn't sure if you actually meant this or rather the rumoured new Cell in the PS4, which I suspect would be another thing than the POWER8 and presumably in-order again. But specifics on the PS4's CPU are all just rumours yet anyway whereas we definitely know that POWER8 is in the works and that it will probably sport SPE technology. Like you I think POWER8 will likely feature out-of-order execution but I couldn't find any definite information on this so far.
  • »02.07.11 - 18:53
    Profile
  • Jim
  • Yokemate of Keyboards
    Yokemate of Keyboards
    Jim
    Posts: 4977 from 2009/1/28
    From: Delaware, USA
    Your posts contained the first speculation about Power8 featuring SPEs that I'd seen, but it does look like this is going to happen.
    The PS4 is a complete unknown since Sony doesn't want the public to focus on that yet. I think they're afraid it would hurt PS3 sales.
    "Never attribute to malice what can more readily explained by incompetence"
  • »02.07.11 - 19:03
    Profile
  • Order of the Butterfly
    Order of the Butterfly
    minator
    Posts: 370 from 2003/3/28
    Quote:

    Your posts contained the first speculation about Power8 featuring SPEs that I'd seen,


    Not really, IBM have pretty much said this themselves.

    Quote:

    but it does look like this is going to happen.
    The PS4 is a complete unknown since Sony doesn't want the public to focus on that yet. I think they're afraid it would hurt PS3 sales.


    Sony are keeping quiet but there's been rumours about a 16 SPE+2 PPE version.
    Sony have said they want to keep costs down and give an architecture change would be a huge cost I take that to confirm Cell again.

    As to whether the new PPE is Out-of-Order is another question altogether. They should be able to get the power down by now, but it's really a question of if it provides that much benefit to the target applications.
  • »02.07.11 - 22:39
    Profile Visit Website