Running nVidia’s Endless City tessellation demo on Radeons

Updated 28-11-2010: Patch now includes support for 32-bit and 64-bit binaries.

When I read about nVidia’s Endless City demo, it appeared that it would only run on nVidia hardware. I found that a bit strange, as it is using only DirectX 11 tessellation, and as such shouldn’t be requiring nVidia hardware specifically.

nVidia’s Endless City Tessellation Demo

When you try to run it on other hardware, you get this messagebox:

NVIDIA Endless City: GTX 580 DX11 Tech-Demo

Cuda-enabled GPU? That piqued my interest, so I decided to poke around the code a bit, and see what it would be using Cuda for. After a quick look through the code, it seemed that while it did initialize Cuda, it didn’t appear to actually do anything with it. I decided to disable the initialization code, and see if it still worked, to make sure I didn’t overlook anything. And indeed, everything seemed to run just fine. My guess is that nVidia just used a stock graphics demo framework, and Cuda is always enabled by default. They just didn’t bother to take it out.

Anyway, if it doesn’t require Cuda (and I didn’t spot any other nVidia-specific things either), then it should be able to run on any DirectX 11 hardware. However, as I checked the code to see what could trigger the above messagebox, I also spotted a specific “nvidia” vendor string check. I decided to disable this as well.

Now it should work on any DirectX 11 hardware. I have prepared the following zip file for everyone to run the Endless City on their hardware: PatchEndlessCity.zip

I’ll explain what it is and what it does exactly. I have created a dummy NvCuda.dll, because that was the easiest way to make the demo run on systems without the actual DLL. It simply exists to stop Windows from complaining about a missing file, and doesn’t actually contain any code. You can put it anywhere in your path, but I would suggest placing it in the folder where the demo binares are installed (probably something like “C:\Program Files\NVIDIA Corporation\NVIDIA Demos\Endless City\bin”). Pick the NvCuda.dll from the x86 folder in the zip file if you want to run the 32-bit version, and pick NvCuda.dll from the x64 folder if you want to run the 64-bit version (since they have the same file name, they obviously cannot both be in the same folder at the same time. Normally the 32-bit version is installed in \Windows\SysWow64, and the 64-bit version is installed in the \Windows\System32 directory, in which case Windows will automatically handle the search order correctly).

The file PatchEndlessCity.exe should also be copied to the folder in which the demo binaries are installed. When you execute it (you might need to run it as administrator, because it needs to write to the EndlessCity.exe and EndlessCity64.exe files in Program Files), it will remove the above-mentioned vendor-check code. After you have run this patch, EndlessCity.exe and EndlessCity64.exe should work on any DirectX 11 hardware. If you have problems running the patch, make sure you have the Visual C++ 2010 redistributable for x86 installed.

For those who want to know the exact details of the patch:

Offsets 0x000C3A1E through 0x000C3A29 in EndlessCity.exe are overwritten with nop instructions (0x90) to disable the respective code. Likewise, in EndlessCity64.exe, offsets 0x000EE705 through 0x000EE71F are overwritten with nop (0x90). So if you don’t trust the patch, you could perform the patch manually with a hex editor.

I’d love to hear what framerates people will get on their Radeons. And it will be especially interesting to test this demo on the upcoming Radeon 6900 series, to see how much AMD managed to improve their tessellation.

This entry was posted in Direct3D, Hardware news, Software development, Software news. Bookmark the permalink.

35 Responses to Running nVidia’s Endless City tessellation demo on Radeons

  1. Mark Davis says:

    So i tried the demo and was getting 12fps with tesselation, 20fps with no tessellation- @930/1150

    @default clocks: 10FPS w/ tessellation, 20 w/out

    Specs:
    q9550 @ 3.35
    5gb DDR2 @ 740mhz (4-4-4-13)
    HD 5850

  2. 3p says:

    im getting 10 fps @1920*1080 resolution with HD5850.

  3. VadimDee says:

    Thanks dude

  4. Plablublalbu says:

    13-15 fps with tessellation on 15-19 fps with tessellation off (GTS 450)

  5. fp says:

    Fraps showed 12/20 fps w/wo tesselation on Radeon HD 6870 def. clock @1920×1200

  6. John Mautari says:

    It works like a charm!

    Thank you for this!

    Running on HD 5970 @ 1920×1200

  7. Pingback: Running Nvidia's Endless City tessellation demo on 5xxx/6xxx Radeon hardware

  8. ShadowLink says:

    Runns on my GTX580 with 60fps with tesselation 🙂

  9. Pingback: AMD Radeon 6900 series: Much ado about nothing… | Scali's blog

  10. Troll says:

    people really need to learn things, first off the endless city is a nvidia ONLY benchmark, IE: Its optimized for nvidia GPU’s. and because people like to run it on amd gpu’s you will get poor performance because its not even optimized, hell i’d go as far to say its unoptimized for amd gpu’s.

    and the final fact, amd gpu’s are actually good gpu’s. The fact that all the benchmarks are mainly optimized for nvidia to make amd look bad is kinda funny. and on another note: look what amd got first, tessellation control in thier drivers.

    • Scali says:

      It’s too easy to dismiss benchmarks just because your favourite brand doesn’t perform the way you like to see.
      People scream ‘optimization’ without even having a clue about what it means. If you disable tessellation, this benchmark gets perfectly normal framerates on AMD hardware. That is, the relative performance of different GeForce and Radeon cards is pretty much the same as what you see in most games and benchmarks. So although the benchmark was never meant to run on AMD hardware, there’s nothing strange about the general rendering engine used in this demo. It’s just standard DX11 code.

      As for the tessellation… Do you even know what tessellation looks like in DX11? It’s just a set of shaders. That’s all there is to this benchmark: a set of standard DX11 HLSL tessellation shaders. There’s not too much room for optimization there. Certainly not enough for AMD to magically clear the huge performance gap with nVidia here. And it is pretty obvious that AMD’s newer hardware with improved tessellation scales quite well (and CrossFire setups work nicely as well, since they also effectively have a dual tessellator). The real problem is not in how optimal the code is, but rather in the fact that AMD’s hardware cannot handle the level of tessellation used in this tessellation benchmark (and various others for that matter). The only ‘optimization’ to fix that, is to use less tessellation (the tessellation equivalent to using lower resolution textures, so to speak), which isn’t optimization of code, but just reducing the workload. Clearly this will make nVidia’s hardware run faster as well… The only thing that can make AMD perform equal is for their cards to run with less tessellation detail than nVidia cards. And that’s the kind of ‘optimizations’ that AMD is planning, aka driver cheats to trade image quality for performance. The developer has full control of tessellation detail in DX11 and OpenGL 4.0. Why on earth would a driver have to have a control for this? And people actually think this is a GOOD thing? “Yay, we can now hack our games so that they render less detail than they’re supposed to!”

      So it’s just a hardware deficiency. Standard DX11 or OpenGL 4.0 tessellation code is simply handled much more efficiently by nVidia’s current tessellation architecture than AMD’s. That has been established as solid fact by now. nVidia’s cards even grossly outperform AMD’s cards in the DX11 tessellation demos that AMD made at the introduction of the 5000-series, long before nVidia even had a DX11 card on the market. Surely you’re not going to claim that even THOSE demos are optimized for nVidia hardware?

  11. Tester says:

    Runs ~24 fps in 1920×1080, 6950@6970, fullhd, Catalyst 11.1a. Not bad at all.

  12. madmodmike says:

    Is it possible to do something similar (ATI patches) with older demos, like MadModMike, Nalu, Luna and so on?

    • Scali says:

      Most older nVidia demos use OpenGL with special nVidia driver extensions. You can’t just patch those, you’d need to write an emulation layer that translates the nVidia extensions to ATi-compatible code. Although technically it’s not impossible, it’s a lot more work than I’m willing to put in.
      If a demo only uses Direct3D code (and nothing extra, such as Cuda), such as this one, then it can be patched very easily.
      But as far as I know, all the demos you mentioned are OpenGL.

  13. Lars Erik Realfsen says:

    What about the HumanHead demo?

    • Scali says:

      The human head demo is included as a sample in the DirectX SDK.

      Edit: Or at least, I thought it was.
      There’s the SparseMorphTargets one, which is similar. Perhaps I’m mistaken, or it used to be in the SDK, but not anymore (might have morphed into the SparseMorphTargets sample?)

  14. komar says:

    thank you so much , i really enjoy seeing this demo runin’ on amd hardware .

    30 fps maxed out (tess on ) with 5970 (900/1200) .

    it’s low fps but smooth runnin’ demo with cat 11.6 .

  15. komar says:

    any way u can turn off amd tess optimisation using the catalyst tab .

    but clearly amd hardware is weaker on such demanding hardware demo … it is a fact . POINT.

  16. Jack Johnson says:

    Its cause nvidia has more tesselator units then amd.You can have 264xteselation while amd has only 64x I have an 6970 pcs+ and all games work fine but those are exeptions.

  17. Stewe says:

    Nvidia fog demo runs on 6600 but could i run it on 6770?

  18. Pingback: AMD Radeon 7970: Graphics Core Next debuts | Scali's blog

  19. QUINTIX says:

    I’m not sure why “AMD optimized” tesselation is so offensive to you… mipmapping for tesselation is not a bad idea. There is little point in plotting as many triangles as pixels if not more in a given area.

    If the triangles had a minimum surface area of 8 pixels each on both platforms, I am sure AMD would better match NVidia across all teselation factors. After all, amd/ati has been doing fix-function tesselation a lot longer than nVidia (going back to the r200), and nvidia does have a history of creating brute-force solutions/engaging in benchmarketing (and driver cheats) (more so than ATI, of which I am sure you will disagree), which is no doubt why they where able to do tesselation “so well” out of the blue.

    Anyways, the demo runs like a slideshow on my hybrid crossfire system:
    radeon 6670 with 200mhz DDR3 (1.6ghz bus) (not GDDR3),
    A8-3850 apu (radeon 6550) with 233mhz DDR3 (1.86 ghz bus)
    I do not have fraps installed on my desktop yet, so I cannot give exact numbers.

    What a dissappointment; My thought was that fix-fuction tesselation was supposed to be a means of getting more out of finite resources (specifically memory bandwidth), like mip-mapping or texture and z-buffer compression. Even the radeon 4250 (r600 based) in my laptop has a fixed function tesselator. It seems that game devs and (platform specific) benchmarketing pr folk are treating tesselation as a high end feature. If they did not and took “full control of tessellation” there would be no need to “optimize” at the driver level.

    • Scali says:

      “Offensive” is not the word. But “driver cheat” is.
      Not sure what you are talking about with mipmapping. This is just a hard clamp on the maximum tessellation level (even though most, if not all, tessellation benchmarks and games use an adaptive tessellation algorithm anyway).

      And in most cases the triangles indeed *are* 8 pixels or larger. Don’t let AMD tell you anything different. Run the software in wireframe mode and you can see the individual triangles.

      nVidia’s tessellation is better because they made it a parallel solution, as it should have been. Pretty obvious, don’t you think? You take 1 triangle/patch, and subdivide it into N triangles. How exactly are you going to handle these N triangles if you try to squeeze them through a serial pipeline, handling only 1 triangle at a time? Exactly, you don’t. Which is AMD’s problem. The rest is just smoke and mirrors from AMD, trying to hide this fact.

      As a result, you can’t reap the full benefits of tessellation on AMD hardware. Endless City is showing these benefits. As formulated by Microsoft: http://msdn.microsoft.com/en-us/library/windows/desktop/ff476340(v=VS.85).aspx#tessellation_benefits
      Tessellation:
      •Saves lots of memory and bandwidth, which allows an application to render higher detailed surfaces from low-resolution models. The tessellation technique implemented in the Direct3D 11 pipeline also supports displacement mapping, which can produce stunning amounts of surface detail.
      •Supports scalable-rendering techniques, such as continuous or view dependent levels-of-detail which can be calculated on the fly.
      •Improves performance by performing expensive computations at lower frequency (doing calculations on a lower-detail model). This could include blending calculations using blend shapes or morph targets for realistic animation or physics calculations for collision detection or soft body dynamics.

      • QUINTIX says:

        Those wireframes are awfully thick in the demo, maybe it because (moves closer a bit)
        OHH!!! They *are not* more than 8 pixels a pop. You are bright enough. This should have been obvious to you.

      • Scali says:

        Since you can actually SEE the wires in most places, the triangles are not that small:

        What people like caveman-jim still don’t get about tessellation…


        In most places the triangles are *considerably* larger than 8 pixels:


        And as said, at the edges, you may get triangles smaller than 8 pixels anyway, even if you are not using tessellation at all. That’s just what happens when triangles get nearly perpendicular to the screen.

        Besides, this is all just a red herring.
        After all, ONE of the DX11 architectures can render these triangles at decent speeds.
        Ironically it is also the DX11 architecture that is slightly less efficient at rendering small triangles:

        Proof of what I said in my previous blog, from an unexpected place

        So yes, I’m bright enough. Are you?

  20. Pingback: Oldskool demo fixing | Scali's blog

  21. Pingback: Random thoughts on programming, culture and such | Scali's OpenBlog™

  22. Remov Cain says:

    The link for the patch is offline.

    • Scali says:

      Yes, sorry about that. The server is temporarily unavailable. It should be back at some point in the future, but I cannot give an exact time at the moment.

  23. ru884d says:

    With my 7970 crossfire setup, I get a steady 60 fps with vsync on but lots of flickering going on. If I jump to desktop then back in, it stops flickering but only works with 1 card witch is about 35 fps. That’s with both 32bit and 64bit versions. Be great to crossfire with no flickering. Any ideas?

    • Scali says:

      I’m afraid not… SLI and CrossFire are notorious for having driver issues… and in this case I think it’s unlikely that AMD will fix the drivers to support this application properly.

  24. LukeTeo says:

    Demo works on Intel HD 4000 🙂
    4 fps with 1366×768 tesselation on
    2 fps with 1920×1080 tesselation on

  25. Stiletto says:

    I’d be interested in a copy of this patch when it’s available again.
    Interestingly, here are some other patches:
    http://extreme.pcgameshardware.de/grafikkarten/235895-amd-patch-fuer-alle-public-dx10-11-nvidia-techdemos-ilan.html

  26. Pingback: AMD tries to do more damage-control for tessellation | Scali's OpenBlog™

Leave a reply to Scali Cancel reply