Today AMD has (paper-)launched their new Radeon 7970 card. Its GPU is based on the new ‘Graphics Core Next’ architecture. A few months ago, some information on this architecture became available, and made it clear that AMD was going for something completely different than before. More than that, their next architecture would be remarkably similar to nVidia’s Fermi in many ways.
At the time I said it could go either way:
It will be interesting to see AMD’s upcoming architecture. In a way they have to start completely from scratch. Not only are they completely redesigning their GPU for the first time in years, they also need to write completely new compilers to optimize for this different approach. nVidia has a head start here. That could give them the advantage, but on the other hand, it might not say much. Take for example AMD’s Athlon64 architecture: there was no doubt that the integrated memory controller had its advantages in theory. Intel moved to an integrated memory controller much later (just like a single-die quadcore). However, once Intel made that move, they immediately took quite a lead over AMD.
So now that it’s here, let’s see how it has turned out. Let’s just go by Anandtech’s review. The first impression is quite good: it outperforms all other single-GPU solutions in pretty much every scenario. But, more to the point: it also seems to do well in tessellation and GPGPU tasks, two scenario’s where AMD’s older architecture was relatively weak.
To be honest though, I’m not entirely convinced about the tessellation part. It may be faster than the previous generation, but it still isn’t a truly parallel implementation, but only two fixed tessellation units, like the earlier series. Namely, if we take this chart:
This still shows the same problem as we got from the 5000-series to the 6000-series: The baseline here is the 6000-series, which has an exponential performance dropoff as we know. A true parallel implementation such as nVidia’s PolyMorph engine has a more linear scaling characteristic. As such, you should see the red line going up compared to the yellow line as the tessellation factor increases. Instead, we see that above factor 16, the red line is more or less horizontal, and even slightly downward. So the dropoff is the same: exponential in nature (and why does AMD only show up to factor 31 anyway? OpenGL and Direct3D go up to 64, I’d like to know how it performs over the ENTIRE range).
The lower tessellation factors may look to have the scaling that we expect, but it is distorted somewhat since the original 6000-series has quite a spike in the low regions compared to the 5000-series:
The graph of the 7000-series is more or less the inverse of the spikes we see here. So that would make the 7000-series probably about equal in scaling characteristic to the 5000-series.
Therefore it is rather sad that most reviewers only seemed to use games or tessellation benchmarks with relatively low tessellation settings. I would love to see how the 7000-series holds itself in the Endless City demo. What reviewers should have done was to drive the GPU into the tessellation wall, and see how well it keeps up. Can this GPU keep up with Fermi even at 64x settings? Or is the dropoff still too heavy?
So I have to agree with Anandtech’s conclusion:
At the same time the 7970 is not the 5870. The 5870 relative to both NVIDIA and AMD’s previous generation video cards was faster on a percentage basis. It was more clearly a next-generation card, and DX11 only helped to seal the deal. Meanwhile if you look at straight averages the 7970 is only around 15-25% faster than the GTX 580 in our tests, with its advantage being highly game dependent. It always wins at 2560 and 1920, but there are some cases where it’s not much of a win.
On the one hand, the 7970 is hands-down the best single-GPU card available today. So AMD got the new architecture (and drivers) quite right on the first take. On the other hand, it has to be. It is the only card that takes advantage of the latest technology, with a 4.31 billion transistor GPU at 28 nm, and 3 GB of very fast memory. What would Fermi look like if it were shrunk to 28 nm, scaled up from 3 billion to 4.31 billion transistors, coupled with 3 GB of 5.5 GHz GDDR5 instead of 1.5 GB of 4 GHz GDDR5, and implementing PCI-e 3.0? It would likely be 15-25% faster than the current GTX580 as well, if not more. However, that is just theory, because as far as we know, we won’t be seeing a high end 28 nm GPU from nVidia anytime soon. They are not going for a die-shrunk Fermi, but for a new architecture. And that might just mean that AMD has perfect timing: its main competitor will be the GTX580 for quite a while yet, and the Radeon 7970 does not have much of a problem with that. By the time nVidia’s new high-end 28 nm GPUs arrive, the 7970 may no longer be the fastest, but AMD will probably follow-up with a refresh around that time.