Intel Compiler soap, another episode of AMD fanboy idiocy

Agner Fog recently wrote a blog on how the Intel Compiler decides which codepath to run on a given CPU, and how this affects AMD. While I respect Agner Fog, and basically agree with what he says (I’ve known about this mechanism for many years, and its ‘side-effects’, that is not up for debate as far as I’m concerned), I don’t agree with the way he presented it. The term ‘Cripple AMD’ is not entirely accurate, to say the least. But more on that later. Perhaps Agner wrote it to draw attention to the issue, in which case he has succeeded very well indeed.

I’ll first comment on the benchmarks, as I think that is the REAL problem here. Clearly the Intel Compiler is not a vendor-neutral compiler. That is a common truth. It isn’t a bad thing in itself… but if you want to sell vendor-neutral benchmarking tools, then the LAST thing you should do is to use the Intel Compiler. However, I don’t think that is Intel’s fault (unless someone can prove that Intel somehow bribed or otherwise ‘persuaded’ the developers to use their compiler). Do you convict the storekeeper who sold the rope that someone used to hang himself? While it is possible to hang yourself with a rope, that is not the main reason why rope exists. As such, there is no direct relation between selling rope and murder.

Ofcourse that’s not how AMD fanboys see it. They think Intel needs to be convicted for developing a compiler that supports their products (and obviously not the competition’s). They even think that it’s a case of Intel exploiting their powerful market position. Now that is ridiculous… The Intel Compiler is not bundled with a CPU, it is sold separately. And it doesn’t have a strong market position at all. Microsoft and gcc dominate the compiler market. Intel is just a small player, mainly interesting for scientific research where maximum performance is required, and code is only run on their own (Intel) systems. Another thing that is ridiculous is that they somehow mistake the system requirements (as in, the machine required to run the Intel Compiler) for the optimization targets.

What’s worse, they don’t seem to understand exactly WHAT the Intel Compiler does in the CPU dispatcher. They don’t understand the basics of code optimization. While it goes a little far to explain microarchitectural differences in detail (I will refer to the Intel Optimization Manual for that), I will give a small practical example of a ‘small challenge’ we had at the asmcommunity forum.

Okay, so it’s just a simple coding problem, and the challenge was to create the fastest routine. Now, this was all done in assembly language (with some C++ routines as reference), so no Intel Compiler was used for the code, all code was just written directly for the x86 architecture and ran ‘as is’.

Various programmers of various skill levels contributed their solutions. We put all the different solutions into a single timing framework, and ran it on various PCs. The results were something like this:

Core2 Duo:
– Plain:
1) lingo12
2) r22
3) drizz
1) lingo12/Scali SSSE3
2) Scali SSE2
3) Scali MMX+SSSE3
Pentium 4:
– Plain:
1) r22
2) Scali2/C compiler
3) sysfce2/C compiler
1) Scali SSE2
2) ti_mo_n
3) Scali MMX
Athlon XP:
– Plain:
1) r22
2) sysfce2-1
3) lingo12/Scali2/C compiler/drizz
1) Scali MMX
2) ti_mo_n

As you can see, I made a top 3 for all different CPU types (and separate for ‘plain x86’ routines and routines using extensions such as MMX or SSE)… and as you can see, the top 3 is different everytime. Now that is the point I’m making here. Which solution is the winner? In other words, which is the most optimal routine? You cannot answer that, unless you specify which CPU. The keyword here is ‘microarchitecture’.

Now this is the problem that all compilers face: what may be optimal for one CPU may be disastrous for another. Eg, lingo12’s routine is the fastest on Core2 Duo, but it isn’t even in the top 3 on a Pentium 4. And those are both Intel CPUs, and they both support pretty much the same features.

That’s why the Intel Compiler doesn’t just check for “GenuineIntel” and whether or not eg SSE3 is supported. No, it will also check other CPUID info, such as the family number. This way it can know what microarchitecture the CPU has, and then pick the proper code for it. Agner Fog actually mentions this in his blog, but it seems like it’s not understood and/or just ignored by everyone who comments on it. He even points out that Intel has a rather peculiar way of naming their CPUs, just to keep this mechanism of microarchitecture selection working.

Namely, if we start at the beginning… CPUID was introduced on Pentium, and also added to 80486 processors of that era. The Pentium was named family 5 (Pentium meaning ‘fifth’, referring to the fact that it’s technically the 80586, but Intel dropped the use of numbers in favour of names because they couldn’t trademark a number, and competitors kept using the same names as Intel). The 486 was family 4, and technically it would go down to 386, 286, 186 and 8086. So originally the family number was just the CPU model. Everytime Intel introduced a new microarchitecture, the family was increased by one.

This also held true for the Pentium Pro, which was family 6. Pentium II and Pentium III were also family 6, as they were little more than a Pentium Pro with MMX and SSE extensions added, from a microarchitectural point-of-view. As such, the optimization rules didn’t change. When Intel introduced the Pentium 4, the family number was increased again, this time to family 15, with an extra ‘extended family’ field added to CPUID (and the Pentium 4 required vastly different code to reach optimal performance). However, when Intel introduced the Core2, they went BACK to family 6. And Core i3/i5/i7 also still report family 6.The logic behind this move is that Core2 and Core i3/i5/i7, while technically new microarchitectures, have performance characteristics very similar to the PPro-family, and as such require pretty much the same type of optimizations. So when older software (most notably generated by the Intel Compiler) ‘sees’ family 6, it will select the most optimal code.

But therein lies the problem: If you don’t know anything about the microarchitecture, you can’t select the most optimal codepath, because you have no idea which one that should be. The above ‘small challenge’ illustrates that perfectly. The optimal code on one microarchitecture can be disastrous on another. Take for example the Athlon XP. The “Scali MMX” routine is the fastest of all routines there. Okay, so you run it on a Pentium 4… You detect MMX, so you figure you should select “Scali MMX”, as that is the fastest routine for MMX-supporting CPUs. Well, the joke’s on you. “Scali MMX” is second-to-last on the Pentium 4. And it’s not just the Pentium 4… on the Core2 Duo, “Scali MMX” is well in the lower regions as well.

So as you see, instructionset features don’t say anything about performance, it’s a gamble. Currently Intel doesn’t take a gamble at all, and just plays it safe. If Intel were to take a gamble on a microarchitecture that it doesn’t know, and it would be a bad choice (such as the above “Scali MMX” example), people would still cry foul. My suggestion to Intel is that they ask their x86 licensees to pick the codepath that the Intel compiler should run for their microarchitecture, and get it in writing. That way, Intel shows their good intentions, it will *probably* pan out well for end-users in practice most of the time (the current Phenom microarchitecture is not that different from Core2/Core i7), and if it fails, then it isn’t Intel’s fault, as they just do what is agreed on.

The irony of this is that the Pentium 4 suffered quite severely from this microarchitectural optimization problem. When it debuted, most code was optimized for Pentium/PPro. The Pentium 4 didn’t handle it very well, especially the x87 implementation was relatively weak. When code was recompiled on P4, avoiding certain performance hazards, and making use of new features like SSE2, to work around the shortcomings such as the x87, it could make quite significant leaps in performance. This is also partly the reason why it was always a strong performer in 3d rendering and video processing. These tasks benefited greatly from SSE2, and the developers in those particular markets were quick to adopt the new technology and recompile/optimize their applications for the new microarchitecture.

To conclude… there may be ‘tainted’ benchmarks, such as PCMark05, compiled by the Intel Compiler, which puts non-Intel CPUs at a disadvantage… but I would like to point out that the shoe has been on the other foot as well. A few years ago, a benchmark called ScienceMark emerged in the review world.  Historically it was one of the few benchmarks that Athlons performed well in… Look at these results for example.An Athlon64 FX-62 about as fast as a Core2 Duo X6800? Amazing, no other benchmark shows results even remotely similar…

The plot thickens when you realize that some of ScienceMark’s developers are/were AMD employees.
(eg ‘redpriest’, as he himself says here: “Full disclosure: I am an engineer that works for AMD (in CPUs and not in graphics)”).

It appears that ScienceMark 2.0 happens to implement synthetic benchmark routines that seem to perform well on Athlon microarchitectures, and not that well on Intel microarchitectures (even the ones that are generally considered to be superior overall by quite a margin, such as Core2 vs Athlon64). Given the link to AMD, is that a coincidence?

PS: I don’t use the Intel Compiler myself. Partly because of issues like these, and partly because I don’t see why I should spend money on another compiler, when the Microsoft/gcc do a fine job as well. However, I do think it’s Intel’s right to only support optimizations for their own CPUs. That is also what they advertise. I don’t like to see any government or organization forcing regulations on Intel in this case, because it would be incredibly arbitrary. It’s bad enough that anti-trust laws and regulations have such an arbitrary character (it seems that companies just sue Microsoft because they know Microsoft can’t win anyway because of their size, and AMD did the same with Intel), but in this case, Intel doesn’t even have a considerable marketshare, so it’s not even an anti-trust case. It would mean that any company could sue any competitor, regardless of marketshare. Whoever has the best spindoctors will convince the judge and jury (who after all, are just laymen).

This entry was posted in Uncategorized. Bookmark the permalink.

5 Responses to Intel Compiler soap, another episode of AMD fanboy idiocy

  1. AzureSky says:

    First, the stuff about sciencemark is not germane to this issue(yes i know this is an old post but intel is still doing this crap today)

    Yes I am an AMD fan but I (like most amd fans i know) wouldnt complain if intel did what I would consider the honourable thing and implemented the same optimizations for comparable AMD/VIA/EXCT chips that they use for their own, even if there where RARE cases of the code running ALOT slower, in those cases the developer would simply have to compile another version for non-Intel users or use a different optimization path(either for specific cpus or just do it for everybody)

    Intel dosnt do this for the reasons you seem to think, its not about stability or the fact that optimized code in rare cases runs like ass on other chips, it is purely to keep their numbers in benchmarks up, intel has at least in the past given discounts or given away copies of their compilers to some companies that made benchmarks, I dont have a problem with that, I do have a problem with them implying that their compiler is fair to all cpu’s when it clearly runs non-intels on a far from optimal codepath.

    I have tested alot of compiles and have stripped the bias checker from alot of apps, and have yet to find a compile that was slower or wouldnt run on my 754 athlon 64 using sse2 all the way thru my most recent system a phenom II 1055t, sse3 runs great on my x6 infact, good bit faster then even sse2 with some apps(one great test is the ogg vorbis encoders on hydrogen audio forums, give them a look)

    Intel dosnt care what we as geeks think really, they care that they stay top of as many benches as they can.

    • Scali says:

      Haha, you must be joking. What a load of nonsense. Firstly, you apparently haven’t understood my blog or my stance. Intel isn’t ‘doing’ anything. They just build a compiler to optimize code for their CPUs. As I said, it’s the developers’ fault if they choose to use this compiler for a benchmark that is meant for more than just Intel processors (and even then, it takes certain commandline switches to make the compiler generate the automatic code path selection. You can also compile static codepaths and force them to run on a CPU, regardless of make or model).

      Secondly, Intel has not even remotely claimed or even implied that their compiler is fair to all CPU’s. It is clearly marketed as a compiler that optimizes for Intel CPUs, nothing more. Aside from that, anyone with half a brain should be able to figure out that a compiler written by a CPU manufacturer is meant only for their own CPUs (which is actually common practice in the world of CPU manufacturers. AMD is one of the exceptions to the rule. The whole situation of a CPU manufacturer not having their own instructionset and instead licensing it from their main competitor is a tad strange anyway. Most other licensed instructionsets are not licensed by an actual CPU manufacturer, and as such, that company does not compete with its licensees. For example, IBM has its own compilers: Is that unfair? HP has their own compilers: Is that unfair? Others, such as Sun and Digital used to have their own compilers as well).

      Thirdly, I never said anything about stability.
      And your claim that Intel needs this compiler to stay ahead in benchmarks is just ridiculous. As I said, the Intel compiler is a very small player. Most software, including benchmarks, is compiled with MSVC or GCC. Regardless of the compiler used, Intel has a healthy lead over AMD in terms of performance.

      Lastly, it’s mostly envy from the AMD crowd. AMD doesn’t have the resources to develop their own compiler. I’m sure that if they did, they would focus solely on their own products as well, and not waste precious resources on trying to write the most optimal compiler for their competitor’s products as well. That doesn’t make any business sense, in two ways: they have to spend more resources AND their competitor’s products will become MORE competitive.

      And by the way, the ScienceMark thing *is* germane. It’s a clear-cut example of AMD influencing benchmark code in order to make their products appear better. It is a degree worse than what Intel is doing. Intel merely makes a compiler available that optimizes for their products. AMD (re)writes the code of a benchmark itself, while ScienceMark tries to give off the impression of an independent, unbiased benchmark.

  2. Andrew says:

    AMD was also involved in Sysmark and they are now discrediting it because their Bulldozer and Llano CPUs are not competitive on that benchmark.

    This really speaks volumes about AMD and their marketing FUD.

  3. falco says:

    The biggest, and only problem is that Intel is lying about this. If you can read then you should understand this:

    “When I started testing Intel’s compiler several years ago, I soon found out that it had a biased CPU dispatcher. Back in January 2007 I complained to Intel about the unfair CPU dispatcher. I had a long correspondence with Intel engineers about the issue, where they kept denying the problem and I kept providing more evidence. They said that:
    The CPU dispatch, coupled with optimizations, is designed to optimize performance across Intel and AMD processors to give the best results. This is clearly our goal and with one exception we believe we are there now. The one exception is that our 9.x compilers do not support SSE3 on AMD processors because of the timing of the release of AMD processors vs. our compiler (our compiler was developed before AMD supported SSE3). The future 10.x compilers, which enter beta this quarter and release around the middle of the year, will address this now that we’ve had time to tune and adjust to the new AMD processors.

    Sounds nice, but the truth is that the CPU dispatcher didn’t support SSE or SSE2 or any higher SSE in AMD processors and still doesn’t today (Intel compiler version 11.1.054). I have later found out that others have made similar complaints to Intel and got similarly useless answers (link link). ”
    Denying that this is problematic at least is not wiser than what AMD fanboys say, and making you de facto Intel fanboy ruining your credibility…

    • Scali says:

      Denying what? I pointed out that the Intel dispatcher doesn’t do anything much for AMD CPUs other than running some fallback x86 path.
      Are you somehow holding me responsible for some anecdote about some Intel employee claiming it does more than that?
      I don’t see why they would make that claim.
      Or the claim of ‘Intel fanboy’ for that matter. I hate all x86 equally, as I have mentioned many times. Not my fault that people like you don’t know there’s more in this world than x86 or Intel/AMD.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s