GeForce GTX1060: nVidia brings Pascal to the masses

Right, we can be short about the GTX1060… It does exactly what you’d expect: it scales down Pascal as we know it from the GTX1070 and GTX1070 to a smaller, cheaper chip, aiming at the mainstream market. The card is functionally exactly the same, apart from missing a SLI connector.

But let’s compare it to the competition, the RX480. And as this is a technical blog, I will disregard price. Instead, I will concentrate on the technical features and specs.

RX480:
Die size: 230 mm²
Process: GloFo 14 nm FinFET
Transistor count: 5.7 billion
TFLOPS: 5.1
Memory bandwidth: 256 GB/s
Memory bus: 256-bit
Memory size: 4/8 GB
TDP: 150W
DirectX Feature level: 12_0

GTX1060:
Die size: 200 mm²
Process: TSMC 16 nm FinFET
Transistor count: 4.4 billion
TFLOPS: 3.8
Memory bandwidth: 192 GB/s
Memory bus: 192-bit
Memory size: 6 GB
TDP: 120W
DirectX Feature level: 12_1

And well, if we would just go by these numbers, then the Radeon RX480 looks like a sure winner. On paper it all looks very strong. You’d almost think it’s a slightly more high-end card, given the higher TDP, the larger die, higher transistor count, higher TFLOPS rating, more memory and more bandwidth (the specs are ~30% higher than the GTX1060). In fact, the memory specs are identical to that of the GTX1070, as is the TDP.

But that is exactly where Pascal shines: due to the excellent efficiency of this architecture, the GTX1060 is as fast or faster than the RX480 in pretty much all benchmarks you care to throw at it. If this would come to a price war, nVidia would easily win this: their GPU is smaller, their PCB can be simpler because of the smaller memory interface, and the lower power consumption, and they can use a smaller/cheaper cooler because they have less heat to dissipate. So the cost for building a GTX1060 will be lower than that of a RX480.

Anyway, speaking of benchmarks…

Time Spy

FutureMark recently released a new benchmark called Time Spy, which uses DirectX 12, and makes use of that dreaded async compute functionality. As you may know, this was one of the points that AMD has marketed heavily in their DX12-campaign, to the point where a lot of people thought that:

  1. AMD was the only one supporting the feature
  2. Async compute is the *only* new feature in DX12
  3. All gains that DX12 gets, come from using async compute (rather than the redesign of the API itself to reduce validation, implicit synchronization and other things that may reduce efficiency and add CPU overhead)

Now, the problem is… Time Spy actually showed that GTX10x0-cards gained performance when async compute was enabled! Not a surprise to me of course, as I already explained earlier that nVidia can do async compute as well. But many people were convinced that nVidia could not do async compute at all, not even on Pascal. In fact, they seemed to believe that nVidia hardware could not even process in parallel period. And if you take that as absolute truth, then you have to ‘explain’ this by FutureMark/nVidia cheating in Time Spy!

Well, of course FutureMark and nVidia are not cheating, so FutureMark revised their excellent Technical Guide to deal with the criticisms, and also published an additional press release regarding the ‘criticism’.

This gives a great overview of how the DX12 API works with async compute, and how FutureMark made use of this feature to boost performance.

And if you want to know more about the hardware-side, then AnandTech has just published an excellent in-depth review of the GTX1070/1080, and they dive deep into how nVidia performs asynchronous compute and fine-grained pre-emption.

I was going to write something about that myself, but I think Ryan Smith did an excellent job, and I don’t have anything to add to that. TL;DR: nVidia could indeed do async compute, even on Maxwell v2. The scheduling was not very flexible however, which made it difficult to tune your workload to get proper gains. If you got it wrong, you could receive considerable performance hits instead. Therefore nVidia decided not to run async code in parallel by default, but just serialize it. The plan may have been to ‘whitelist’ games that are properly optimized, and do get gains. We see that even in DOOM, the async compute path is not enabled yet on Pascal. But the hardware certainly is capable of it, to a certain extent, as I have also said before. Question is: will anyone ever optimize for Maxwell v2, now that Pascal has arrived?

Update: AMD has put a blog-post online talking about how happy they are with Time Spy, and how well it pushes their hardware with async compute: http://radeon.com/radeon-wins-3dmark-dx12

I suppose we can say that AMD has given Time Spy its official seal-of-approval (publicly, that is. They already approved it within the FutureMark BDP of course).

Posted in Direct3D, Hardware news, OpenGL, Software development, Vulkan | Tagged , , , , , , , , , , | 17 Comments

AMD’s Polaris debuts in Radeon RX480: I told you so

In a recent blogpost, after dealing with the nasty antics of a deluded AMD fanboy, I already discussed what we should and should not expect from AMD’s upcoming Radeon RX480.

Today, the NDA was lifted, and reviews appear everywhere on the internet. Cards are also becoming available in shops, and street prices become known. I will make this blogpost very short, because I really can’t be bothered:

I told you so. I told you:

  1. If AMD rates the cards at 150W TDP, they are not magically going to be significantly below that. They will be in the same range of power as the GTX970 and GTX1070.
  2. If AMD makes a comparison against the GTX970 and GTX980 in some slides, then that is apparently what they think they will be targeting.
  3. If AMD does not mention anything about DX12_1 or other fancy new features, it won’t have any such things.
  4. You only go for aggressive pricing strategy if you don’t have anything else in the sense of a unique selling point.

And indeed, all this rings true. Well, with 3. there is  a tiny little surprise that AMD does actually make some vague claims to some ‘foveated rendering’ feature. But at this point it is not entirely clear what it does, how developers should use it, let alone how it performs.

So, all this shows just how good nVidia’s Maxwell really is. As I said, AMD is one step behind, becaue they missed the refresh-cycle that nVidia did on Maxwell. And this becomes painfully clear now: Even though AMD moved to 14nm FinFET, their architecture is so much worse in efficiency that they can only now match Maxwell’s performance-per-watt at 28 nm. Pascal is on a completely different level. Aside from that, Maxwell already has the DX12_1 featureset.

All this adds up to Polaris being too-little-too-late, which has become a time-honoured AMD tradition by now. At first, only in the CPU department. But lately the GPU department appears to have been reduced to the same.

So what do you do? You undercut the prices of the competition. Another time-honoured AMD tradition. This is all well-and-good for the short term. But nVidia is going to launch those GTX1050/1060 cards eventually (and rumour has it that it will be sooner rather than later), and then nVidia will have the full Pascal efficiency at its disposal to compete with AMD on price. This is a similar situation to the CPU department again, where Intel’s CPUs are considerably more efficient, so Intel can reach the same performance/price levels with much smaller CPUs, which are cheaper to produce. So AMD is always on the losing end of a price war.

Sadly, the street prices are currently considerably higher than what AMD promised us a few weeks ago. So even that is not really working out for them.

Right, I think that’s enough for today. We’ll probably pick this up again soon when the GTX1060 surfaces.

Posted in Hardware news | Tagged , , , , , , , , , | 51 Comments

GameWorks vs GPUOpen: closed vs open does not work the way you think it does

I often read people claiming that GameWorks is unfair to AMD, because they don’t get access to the sourcecode, and therefore they cannot optimize for it. I cringe everytime I read this, because it is wrong on so many levels. So I decided to write a blog about it, to explain how it REALLY works.

The first obvious mistake is that although GPUOpen itself may be open source, the games that use it are not. What this means is that when a game decides to use an open source library, the code is basically ‘frozen in time’ as soon as they build the binaries, which you eventually install when you want to play the game. So even though you may have the source code for the effect framework, what are you going to do with it? You do not have the ability to modify the game code (eg DRM and/or anti-cheat will prevent you from doing this). So if the game happens to be unoptimized for a given GPU, there is nothing you can do about it, even if you do have the source.

The second obvious mistake is the assumption that you need the source code to see what an effect does. This is not the case, and in fact, in the old days, GPU vendors generally did not have access to the source code anyway (it might sound crazy, but in the old days, game developers actually developed games, as in, they developed the whole renderer and engine. Hardware suppliers supplied the hardware). These days, they have developer relations programs, and tend to work with developers more closely, which also involves getting access to source code in some cases (sometimes to the point where the GPU vendor actually does some/a lot of the hard work for them). But certainly not always (especially when the game is under the banner of the competing GPU vendor, such as Gaming Evolved or The Way It’s Meant To Be Played).

So, assuming you don’t get access to the source code, is there nothing you can do? Well no, on the contrary. In most cases, games and effect frameworks (even GameWorks) generally just perform standard Direct3D or OpenGL API calls. There are various game development tools available to analyze D3D or OpenGL code. For example, there is Visual Studio Graphics Diagnostics: https://msdn.microsoft.com/en-us/library/hh873207.aspx

Basically, every game developer already has the tools to study which API calls are made, which shaders are run, which textures and geometry are used, how long every call takes etc. Since AMD and nVidia develop these Direct3D and OpenGL drivers themselves, they can include even more debugging and analysis options into their driver, if they so choose.

So in short, it is basically quite trivial for a GPU vendor to analyze a game, find the bottlenecks, and then optimize the driver for a specific game or effect (you cannot modify the game, as stated, so you have to modify the driver, even if you would have the source code). The source code isn’t even very helpful with this, because you want to find the bottlenecks, and it’s much easier to just run the code through an analysis-tool than it is to study the code and try to deduce which parts of the code will be the biggest bottlenecks on your hardware.

The only time GPU vendors actually want/need access to the source code is when they want to make fundamental changes to how a game works, either to improve performance, or to fix some bug. But even then, they don’t literally need access to the source code, they need the developer to change the code for them and release a patch to their users. Sometimes this requires taking the developer by the hand through the source code, and making sure they change what needs to be changed.

So the next time you hear someone claiming that GameWorks is unfair because AMD can’t optimize for it, please tell them they’re wrong, and explain why.

 

Posted in Direct3D, OpenGL, Software development, Software news | Tagged , , , , , , , , , , , , | 16 Comments

The damage that AMD marketing does

Some of you may have have seen the actions of a user that goes by the name of Redneckerz on a recent blogpost of mine. That guy posts one wall of text after the next, full of anti-nVidia rhetoric, shameless AMD-promotion, and an endless slew of personal attacks and fallacies.

He even tries to school me on what I may or may not post on my own blog, and how I should conduct myself. Which effectively comes down to me having to post *his* opinions. I mean, really? This is a *personal* blog. Which means that it is about the topics that *I* want to discuss, and I will give *my* opinion on them. You don’t have to agree with that, and that is fine. You don’t have to visit my blog if you don’t like to read what I have to say on a given topic. In fact, I even allow people to comment on my blogs, and they are free to express their disagreements.

But there are limits. You can express your disagreements once, twice, perhaps even three times. But at some point, when I’ve already given off several warnings that we are not going to ‘discuss’ this further, and keep things on-topic, you just have to stop. If not, I will just make you stop by removing (parts of) your comments that are off-limits. After all, nobody is waiting for people to endlessly spew the same insults, and keep making the same demands. It’s just a lot of noise that prevents other people from having a pleasant discussion (and before you call me a hypocrit, I may delete the umpteenth repeat of a given post, but I left the earlier ones alone, so it’s not like I don’t allow you to express your views at all).

In fact, I think even without the insults, the endless walls of text that Redneckerz produces are annoying enough. He keeps repeating himself everywhere. And that is not just my opinion. Literally all other commenters on that item have expressed their disapproval of Redneckerz’ posting style (which is more than a little ironic, given the fact that at least part of Redneckerz’ agenda is to try and paint my posting style as annoying and unwanted).

Speaking about the feedback of other users, they also called him out on having an agenda, namely promoting AMD. Which seems highly likely, given the sheer amount of posts he fires off, and the fact that their content is solely about promoting AMD and discrediting nVidia.

The question arose mainly whether he was just a brainwashed victim of AMD’s marketing, or whether AMD would actually be compensating him for the work he puts in. Now, as you can tell from the start of the ‘conversation’, this was not my first brush with Redneckerz. I had encountered him on another forum some time ago, and things went mostly the same. He attacked me in various topics where I contributed, in very much the same way as here: an endless stream of replies with walls-of-text, and poorly conceived ideas. At some point he would even respond to other people, mentioning my name and speculating what my reply would have been. However, I have not had contact with him since, and Redneckerz just came to my blog out of the blue, and started posting like a maniac here. One can only speculate what triggered him to do that at this moment (is it a coincidence that both nVidia and AMD are in the process of launching their new 16nm GPU lineups?)

Now, if Redneckerz was just a random forum user, we could leave it at that. But in fact, he is an editor for a Dutch gaming website, Gamed.nl: http://www.gamed.nl/editors/215202

That makes him a member of the press, so the plot thickens… I contacted that website, to inform them that one of their editors had gone rampant on my blog and other forums, and that they might want to take action, because it’s not exactly good publicity for their site either. I got some nonsensical response about how they were not responsible for what their editors post on other sites. So I replied that this isn’t about who is responsible, but what they could do is talk some sense into him, for the benefit of us all.

Again, they were hiding behind the ‘no responsibility’-guise. So basically they support his conduct. Perhaps they are in on the same pro-AMD thing that he is, whatever that is exactly.

I’ve already talked about that before, in general, in my blog related to the release of DirectX 12. About how the general public is being played by AMD, developers and journalists. Things like Mantle, async compute, HBM, how AMD allegedly has an advantage in games because they supply console APUs and whatnot. This nonsense has become so omnipresent that people think this is actually the reality. Even though benchmarks and sales figures prove the opposite (eg, nVidia’s GTX960 and GTX970 are the most popular cards among Steam users by a margin: http://store.steampowered.com/hwsurvey/videocard/).

Just like we have to listen to people claiming Polaris is going to save AMD. Really? The writing is already on the wall: AMD’s promotional material showed us a slides with two all-important bits of information:

amd-radeon-rx-480-polaris-graphics-card-635x411

amd-rx-480-polaris-4-1200x675

First, we see them compare against the GeForce GTX970/980. Secondly, we see them stating a TDP of 150W. So, the performance-target will probably be between GTX970 and GTX980 (and the TFLOPS rating also indicates that ballpark). And the power envelope will be around 150W. They didn’t just put these numbers on there at random. The low-balling pricetag is also a tell-tale sign. AMD is not a charitable organization. They’re in this business to make money. They don’t sell their cards at $199 to make us happy. They sell them at $199 because they’ve done the maths and $199 will be their sweet-spot for regaining marketshare and getting enough profit. Desperately trying to keep people from buying more of those GTX960/970/980 cards until AMD gets their new cards on the market. If they had a killer architecture, they’d charge a premium because they could get away with. nVidia should have little trouble matching that price/performance-target with their upcoming 1050/1060.

Which matches exactly with how I described the situation AMD is in: they are one ‘refresh’ behind on nVidia, architecture-wise, since they ‘skipped’ Maxwell, where nVidia concentrated on maximizing performance/watt, since they were still stuck at 28 nm. I said that it would be too risky for AMD to do the shrink to 16 nm and at the same time, also do a major architectural overhaul. So it would be unlikely for AMD to completely close the gap that nVidia had opened with Maxwell. And that appears to be what we see with Polaris. When I said it, I was accused of being overly negative towards AMD. In fact, Kyle Bennett of HardOCP said basically the same thing. And he was also met by a lot of pro-AMD people who attacked him. After AMD released their information on Polaris however, things went a bit quiet on that side. We’ll have to wait for the actual release and reviews at the end of this month, but the first signs don’t point to AMD having an answer to match Pascal.

The sad part is that it always has to go this way. You can’t say anything about AMD without tons of people attacking you. Even if it’s the truth. Remember John Fruehe? Really guys, I’m trying to do everyone a favour by giving reliable technical info, instead of marketing BS. I can do that, because I actually have a professional background in the field, and have a good hands-on understanding of CPU internals, GPU internals, rendering algorithms and APIs. Not because I’m being paid to peddle someone’s products, no matter how good or bad they are.

In fact, a lot of the comments I make aren’t so much about AMD’s products themselves, but rather about their inflated and skewed representation in the media.

Posted in Hardware news, Science or pseudoscience? | Tagged , , , , , | 25 Comments

Commander Keen 4: now in 16-colour composite CGA!

Perhaps you have already heard it through the grapevine, but just to be sure, I’d like to mention that VileR (of 8088 MPH fame)has patched the Keen 4 code and redid all the graphics to make the most of the 16-colour composite CGA mode.

It started with a tweet from John Carmack, after he had seen the video from the 8-bit guy covering CGA graphics (and featuring 8088 MPH):

I never knew about composite CGA optimizations!

So the original Keens had graphics designed for 4-colour RGBI mode only. Well, challenge accepted!

VileR has documented everything quite nicely on his blog. And you can find the download links and some discussion in this Vogons thread. So enjoy reading all about it!

I will leave you with some captures from my own PCs.

First my IBM 5160 with a new-style CGA card:

Then my Commodore PC10-III (8088 at 9.54 MHz) with its onboard Paradise PVC4:

And finally, the PC10-III with my ATi Small Wonder card:

The graphics were originally done for old-style CGA. As you can see, the new-style CGA is more saturated, but still acceptable.

The Paradise PVC4 is very saturated as well, and the colours are slightly off, but still somewhat recognizable.

The ATi Small Wonder is completely off however.

Posted in Oldskool/retro programming, Software development | Tagged , , , | Leave a comment

Thought: Neutral and objective versus politically correct.

I suppose most people don’t understand the difference between being neutral/objective and being politically correct. A neutral and objective observer can still be critical about what he observes. A politically correct observer cannot voice any criticism.

I suppose you could say that being neutral/objective is being at the origin, while being politically correct is being exactly halfway the two extremes. When one extreme is larger than the other, the two are not the same.

My aim is to be as neutral and objective as possible. I have no desire whatsoever to be politically correct though. I voice whatever criticism I see fit.

Posted in Uncategorized | 6 Comments

nVidia’s GeForce GTX 1080, and the enigma that is DirectX 12

As you are probably aware by now, nVidia has released its new Pascal architecture, in the form of the GTX 1080, the ‘mainstream’ version of the architecture, codenamed GP104. nVidia had already presented the Tesla-varation of the high-end version earlier, codenamed GP100 (which has HBM2 memory). When they did that, they also published a whitepaper on the architecture.

It’s quite obvious that this is a big leap in performance. Then again, that was to be expected, given that GPUs are finally moving from 28 nm to 14/16 nm process technology. Aside from that, we have new HBM and GDDR5x technologies to increase memory bandwidth. But you can find all about that on the usual benchmark sites.

I would like to talk about the features instead. And although Pascal doesn’t improve dramatically over Maxwell v2 in the feature-department, there are a few things worth mentioning.

A cool trick that Pascal can do, is ‘Simultaneous Multi-Projection’. Which basically boils down to being able to render the same geometry with multiple different projections, as in render it from multiple viewports, in a single pass. Sadly I have not found any information yet on how you would actually implement this in terms of shaders and API states, but somehow I think it will be similar to the old geometry shader functionality where you could feed the same geometry to your shaders multiple times with different view/projection matrices, which allowed you to render a scene to a cubemap in a single pass for example. Since the implementation of the geometry shader was not very efficient, this never caught on. This time however, nVidia is showcasing the performance gains for VR usage and such, so apparently the new approach is all about efficiency.

Secondly, there is conservative rasterization. Maxwell v2 was the first card to give us this new rendering technology. It only supported tier 1 though. Pascal bumps this up to tier 2 support. And there we have the first ‘enigma’ of DirectX 12: for some reason hardly anyone is talking about this cool new rendering feature. It can bump up the level of visual realism another notch, because it allows you to do volumetric rendering on the GPU in a more efficient way (which means more dynamic/physically accurate lighting and less pre-baked lightmaps). Yet, nobody cares.

Lastly, we have to mention Asynchronous Compute Shaders obviously. There’s no getting around that one, I’m afraid. This is the second ‘enigma’ of DirectX 12: for some reason everyone is talking about this one. I personally do not care about this feature much (and neither do various other developers. Note how they also point out that it can even make performance worse if it is not tuned properly, yes also on AMD hardware. Starting to see what I meant earlier?). It may or may not make your mix of rendering/compute tasks run faster/more efficiently, but that’s about it. It does not dramatically improve performance, nor does it allow you to render things in a new way/use more advanced algorithms, like some other new features of DirectX 12. So I’m puzzled why the internet pretty much equates this particular feature with ‘DX12’, and ignores everything else.

If you want to know what it is (and what it isn’t), I will direct you to Microsoft’s official documentation of the feature on MSDN. I suppose in a nutshell you can think of it as multi-threading for shaders. Now, shaders tend to be presented as ‘threaded’ anyway, but GPUs had their own flavour of ‘threads’, which was more related to SIMD/MIMD, where they viewed a piece of SIMD/MIMD code as a set of ‘scalar threads’ (where all threads in a block share the same program counter, so they all run the same instruction at the same time). The way asynchronous shaders work in DX12 is more like how threads are handled on a CPU, where each thread has its own context, and the system can switch contexts at any given time, and determine the order in which contexts/threads are switched in a number of ways.

Then it is also no surprise that Microsoft’s examples here include synchronization primitives that we also know from the CPU-side, such as barriers/fences. Namely, the nature of asynchronous execution of code implies that you do not know exactly when which piece of code is running, or at what time a given point in the code will be reached.

The underlying idea is basically the same as that of threading on the CPU: Instead of the GPU spending all its time on rendering, and then spending all its time on compute, you can now start a ‘background thread’ of compute-work while the GPU is rendering in the foreground. Or variations on that theme, such as temporarily halting one thread, so that another thread can use more resources to finish its job sooner (a ‘priority boost’).

Now, here is where the confusion seems to start. Namely, most people seem to think that there is only one possible scenario and therefore only one way to approach this problem. But, getting back to the analogy with CPUs and threading, it should be obvious that there are various ways to execute multiple threads. We have multi-CPU systems, multi-core CPUs, then there are technologies such as SMT/HyperThreading, and of course there is still the good old timeslicing, that we have used since the dawn of time, in order to execute multiple threads/asynchronous workloads on a system with a single CPU with a single core. I wrote an article on that some years ago, you might want to give it a look.

Different approaches in hardware and software will have different advantages and disadvantages. And in some cases, different approaches may yield similar results in practice. For example, in the CPU world we see AMD competing with many cores with relatively low performance per core. Intel on the other hand uses fewer cores, but with more performance per core. In various scenarios, Intel’s quadcores compete with AMD’s octacores. So there is more than one way that leads to Rome.

Getting back to the Pascal whitepaper, nVidia writes the following:

Compute Preemption is another important new hardware and software feature added to GP100 that allows compute tasks to be preempted at instruction-level granularity, rather than thread block granularity as in prior Maxwell and Kepler GPU architectures. Compute Preemption prevents long-running applications from either monopolizing the system (preventing other applications from running) or timing out. Programmers no longer need to modify their long-running applications to play nicely with other GPU applications. With Compute Preemption in GP100, applications can run as long as needed to process large datasets or wait for various conditions to occur, while scheduled alongside other tasks. For example, both interactive graphics tasks and interactive debuggers can run in concert with long-running compute tasks.

So that is the way nVidia approaches multiple workloads. They have very high granularity in when they are able to switch between workloads. This approach bears similarities to time-slicing, and perhaps also SMT, as in being able to switch between contexts down to the instruction-level. This should lend itself very well for low-latency type scenarios, with a mostly serial nature. Scheduling can be done just-in-time.

Edit: Recent developments cause me to clarify the above statement. I did not mean to imply that nVidia has an entirely serial nature, and only a single task is run at a time. I thought that it was common knowledge that nVidia has been able to run multiple concurrent compute tasks on their hardware for years now (introduced on Kepler as ‘HyperQ’). However, it seems that many people are now somehow convinced that nVidia’s hardware can only run one task at a time (really? never tried to run two or more windowed 3D applications at the same time? You should try it sometime, you’ll find it works just fine! Add some compute-enabled stuff, and still, it works fine). I am strictly speaking about the scheduling of the tasks here. Because, as you probably know from CPUs, even though you may have multiple cores, you will generally have more processes/threads than you have cores, and some processes/threads will go idle, waiting for some event to occur. So periodically these processes/threads have to be switched, so they all receive processing time, and idle time is minimized. What I am saying here deals with the approach that nVidia and AMD take in handling this.

AMD on the other hand seems to approach it more like a ‘multi-core’ system, where you have multiple ‘asynchronous compute engines’ or ACEs  (up to 8 currently), which each processes its own queues of work. This is nice for inherently parallel/concurrent workloads, but is less flexible in terms of scheduling. It’s more of a fire-and-forget approach: once you drop your workload into the queue of a given ACE, it will be executed by that ACE, regardless of what the others are doing. So scheduling seems to be more ahead-of-time (at the high level, the ACEs take care of interleaving the code at the lower level, much like how out-of-order execution works on a conventional CPU).

Sadly, neither vendor gives any actual details on how they fill and process their queues, so we can only guess at the exact scheduling algorithms and parameters. And until we have a decent collection of software making use of this feature, it’s very difficult to say which approach will be best suited for the real-world. And even then, the situation may arise, where there are two equally valid workloads in widespread use, where one workload favours one architecture, and the other workload favours the other, so there is not a single answer to what the best architecture will be in practice.

Oh, and one final note on the “Founders Edition” cards. People seem to just call them ‘reference’ cards, and complain that they are expensive. However, these “Founders Edition” cards have an advanced cooler with a vapor chamber system. So it is quite a high-end cooling solution (previously, nVidia only used vapor chambers on the high-end, such as the Titan and 980Ti, not the regular 980 and 970). In most cases, a ‘reference’ card is just a basic card, with a simple cooler that is ‘good enough’, but not very expensive. Third-party designs are generally more expensive, and allow for better cooling/overclocking. The reference card is generally the cheapest option on the market.

In this case however, nVidia has opened up the possibility for third-party designs to come up with cheaper coolers, and deliver cheaper cards with the same performance, but possibly less overclocking potential. At the same time, it will be more difficult for third-party designs to deliver better cooling than the reference cooler, at a similar price. Aside from that, nVidia also claims that the whole card design is a ‘premium’ design, using high-quality components and plenty of headroom for overclocking.

So the “Founders Edition” is a ‘reference card’, but not as we know it. It’s not a case of “this is the simplest/cheapest way to make a reliable videocard, and OEMs can take it from there and improve on it”. Also, some people seem to think that nVidia sells these cards directly, under their own brand, but as far as I know, it’s the OEMs that build and sell these cards, under the “Founders Edition” label. For example, the MSI one, or the Inno3D one.
These can be ordered directly from the nVidia site.

Posted in Direct3D, Hardware news, OpenGL | Tagged , , , , , , , , , , , , , , | 116 Comments