AMD’s new fight, and why we love Bulldozer

The commentary around Bulldozer, AMD’s latest processor line, is that it’s disappointing, a catastrophe, absolutely positively awful and so on, for miles and miles. And it’s a shame for AMD that Ars, Extreme Tech and the usual suspects have no imagination beyond their benchmarks when it comes to judging processors.

The numbers don’t lie of course – the benchmarks show that Bulldozer’s best is slower than Intel’s, by a long way, and most of those benchmarks care about absolute amount of data processed, numbers crunched and so on. Ars Technica concludes:

AMD compromised single-threaded performance in order to allow Bulldozer to run more threads concurrently, and that trade-off simply hasn’t been worth it.

But everyone’s performance tests make an assumption: that the processor is working for one person at a time, and that person wants it to crunch through as many numbers as possible for their benefit. Those numbers might be rendering frames of Battlefield 3, editing a huge photo or ripping a CD. Benchmarks sum it up the same way – however many ways a single job can be split to use a processor’s cores, they’re only interested in who gets to the finish line fastest.

And Intel still wins, I get it. Even AMD admit in a recent interview that their older processors still perform better for a lot of people:

We understand our customers make purchase decisions based on how they use their PCs, and in many cases our AMD Phenom™ II processors are a great (purchase).

They are struggling to pitch themselves against Intel to the gamers, power-hungry desktop PC users and the benchmark sites. They know they can’t get the same excitable press any more, not for years and another generation of processor. That’s probably why they’re temporarily giving up this fight on pure brawn, laying off hundreds in their PR and marketing departments a couple of weeks ago.

But instead of doing one thing for one person, let’s instead assume that a processor is under siege from 300 warring factions, all wanting to run separate and unpredictable work loads. The benchmark that interests us is: assuming we are running 299 “hostile” jobs, how quickly does that 300th job complete? If we vary those 299 jobs in nature, how reliable is the performance of that 300th? The time of that one lonely, slow, job is what I’m interested in.

For BigV where we are running a massively “multi-tenant” system, we really are planning to put in the order of 300 different customers on a single server. It’s far more important to have a reliable average performance for a virtual machine than the absolute fastest possible performance. I’ve not (yet) seen any benchmarks that test in this way, but it seems to us that more separate hardware cores must achieve this goal better than a single software-switched core.

If that weren’t the case I find it hard to understand investment in massively parallel server systems like the 768-core Atom server or super low-power 480-core ARM systems. Big multi-tenant systems don’t need to be the fastest, they just need to perform consistently.

For BigV, we just want more cores. The speed is almost irrelevant. While AMD’s performance is within the same ball park as Intel’s, they’ll work out very nicely for us virtual machine-mongers. And the low price of Bulldozer chips is just icing – a 32-core, 128GB system is extremely affordable and helps us keep our pricing to customers low.

AMD are gearing up for a different fight, and they don’t need press from the benchmark sites to prove them the winners. So while my gaming PC will stick with a Phenom for a while yet, BigV is going to be using an awful lot of Bulldozer chips in the near future.

5 thoughts on “AMD’s new fight, and why we love Bulldozer

  1. Mbloch,

    If you have case-use data that paints an ideal picture of Bulldozer as a server as compared to other options on the market, please share it. As the author of the ET story, I’m far more concerned with painting an accurate picture of BD’s total performance than with promoting a specific opinion.

  2. Hi Joel. Putting a reliable benchmark together isn’t a small amount of work. But I hope you can see where *all* the current ones fail to evaluate processors properly for a multi-tenant system.

    • Sorry, that sounds curt. I know that you *do* want to give a balanced picture, and I agree that Bulldozers look bad according to the metrics you’ve used. I’m just arguing for different ones. As we start ramping up our server purchases next year, I can see that we’d be motivated to come up with something that makes my point. And I could still be wrong – but right now I’m happy that we want to buy the lowest-possible ratio of hardware cores to customers that we can obtain, regardless of core speed.

  3. M,

    Sorry it took me a few days to get back to you on this. Let me start by saying that obviously you know the needs of your own customers and your software. I’m not arguing that BD *isn’t* the best choice for you, because I honestly don’t know.

    I’m responding primarily to your point when you say: “we want to buy the lowest-possible ratio of hardware cores to customers that we can obtain, regardless of core speed.”

    Let’s leave Bulldozer out altogether, at least to start. Consider Anand’s Westhalem-EX benchmarks: http://www.anandtech.com/show/4285/westmereex-intels-flagship-benchmarked/5

    The results on the first graph are of response times. If I understand correctly, that’s what concerns you most. The E7-4870 configuration is a total of 40 cores; the X7560 is 32 cores, the Opteron Magny-Cours chip is 48 cores.

    Compare throughput between 4 and 5 tiles against the increase in response times. The 6174′s throughput goes up 12%, but its response time rises 40% in the same test. The older Xeons trade a 12% increase in throughput for an 8% rise in response time. The newest Westmere’s scale easily when handling 90 vCPUs as compared to 72.

    Are these results “proof” that BD is a bad fit for you? I have no idea. I’m referencing them because they demonstrate how Magny-Cours’ internal architecture simply isn’t able to keep up with Intel’s in certain cases due to poor internal bandwidth, high latency caches and generally lower IPC.

    Now let’s talk Bulldozer:

    The reason I’m uncertain as to whether or not BD represents a good upgrade path for you is this: Bulldozer’s shared core logic makes the chip scale more poorly than a conventional dual-core design. This isn’t a flaw so much as a known cost of doing business. If you unify a bunch of core logic to save die space, you pay for it in terms of total multi-thread performance. Bulldozer’s penalty appears to be in the neighborhood of 10-20%, which is actually pretty good. (We’re strictly talking integer).

    The problem is, Interlagos’ 16 cores end up scaling more like 13-14 Magny-Cours cores.This is part of why even the better-case benchmarks show Interlagos struggling to distance itself from MC.

    Is there a point or a number of virtual CPUs relative to actual chips where MC/Interlagos become a better option than Intel? I don’t know. I suspect it very much depends on software, server architecture, and the particulars of a workload. What I do know is that the gulf between the IPC and responsiveness of Intel’s high-end Xeons and AMD’s best server chips has become so large, Intel’s CPUs really *can* offer better performance, even in cases where they’re grossly outclassed by core counts.

    An example from the MC launch: http://www.anandtech.com/show/2978/amd-s-12-core-magny-cours-opteron-6174-vs-intel-s-6-core-xeon/9

    Drop back a few pages and you’ll see that the new Opterons are capable of keeping up with the high-end Xeons in rendering tests, which are primarily compute driven, but 12 cores / 24 threads of Xeon at 2.93GHz beat out 32 cores worth of Opteron despite a 3:1 advantage.

    In a lot of ways that matter — particularly when it comes to moving data in and out of caches, or sharing data between cores — MC is slow. Bulldozer is even slower in some of these areas, though it partly compensates with much larger caches.

    If I had access to the appropriate server hardware and the expertise to configure it, I’d gladly help you test the situation. As I previously noted, it’s possible that you’ve got a situation in which Interlagos really is the best choice.

    For what it’s worth, I hope you do. None of the reputable press are sitting back secretly wanting AMD to fail, regardless of what the fanboys may think. We remember the days when dual-socket CPUs went for a minimum of $1500 and even the better consumer chips were exorbitantly priced. AMD is the reason that changed, and no one wants to see it change back.

  4. It’s always been a hard sell to get people to buy lots of slow processors, look how horribly Sun did.

    Most desktop apps are not built to run in parallel so multiple cores don’t really help them. Only when you get to a server running multiple tasks to the cores come into their own.

    If you are building a server farm that runs virtual machines, lots of cheap and fast enough cores in one place sounds like a good idea.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>