An interesting read was posted by a spokes person of Oxide, basically their DX12 benchmark of the Beta Ashes of the Singularity runs a notch better at specific AMD graphics cards. Is seems Nvidia is not supporting DX12 Asynchronous Compute/Shaders and actually asked the dev to disable it (
content updated).
So the breaking story of the day obviously is the news that Nvidia tried to have certain features for the Oxide dev DX12 benchmark Ashes of Singularity disabled, the story continues though. The developer already posted a reply as how and what. Currently is seems that Nvidias Maxwell architecture (Series 900 cards) does not really support Asynchronous compute in DX12 at a proper hardware level. Meanwhile AMD is obviously jumping onto this a being HUGE and they quickly prepared a PDF slide presentation with their take on the importance of all this. Normally I'd share add the slides into a news item, but this is 41 page of content slides, hence I made it available as separate download.
In short, here's the thing, everybody expected NVIDIA Maxwell architecture to have full DX12 support, as it now turns out, that is not the case. AMD offers support on their Fury and Hawaii/Grenada/Tonga (GCN 1.2) architecture for DX12 asynchronous compute shaders. The rather startling news is that Nvidia's Maxwell architecture, and yeah that would be the entire 900 range does not support it. I can think of numerous scenarios as to where asynchronous shaders would help.
Asynchronous shaders (async compute) Normally with say DirectX 11 multi-threaded graphics are handled in a single queue, which is then scheduled in a set order - that's synchronously. Tasks that found themselves in different queues can now (with DX12 / Vulkan / Mantle) be scheduled independently in a prioritized order, which is asynchronous. That brings in several advantages, the biggest being less latency and thus faster rendered frames and response times while you are utilizing the GPU much better.
While this is thorny stuff to explain in a quick one page post, AMD is trying to do so (explain this) with a set of screenshots corresponding to a road with traffic lights. In the past instructions always where handled by the order that they arrived. So they arrive and get queued up. However with DX12 and the combo of GCN GPU architecture instructions can be handled and prioritized separately starting with DX12. Meaning more important tasks and data-sets can be prioritized.
With Asynchronous Shaders three different queues are now available. The graphics queue (rendering), a compute queue (physics, lighting, post-processing effects) and then there is a copy queue (data transfers). Tasks from each or any of these queues can be scheduled independently. All graphics cards based on GCN architecture can now handle multiple command instructions and data flows simultaneously which is managed by compute-engines called ACEs. Each queue can pass instructions without the need to wait on other tasks. That will keep your GPU 100% active as the work-flow is prioritized and thus always available.
Asynchronous shaders will deliver a performance increase up-to 46% in a demo shown with their LiquidVR-SDK dev kit. How big of a difference this is going to be for your 'normal' gaming experience, remains to be seen of course. But every little bump in perf is welcomed of course.
So what happened ? NVIDIA wanted the Asynchronous Compute Shaders feature level disabled by the dev (Oxide) for their hardware as it ran worse. Even though the driver exposed it as being available.
Async DX12 shader support is a uniform API level feature, part of DX12. It allows software to better distribute task intensive data. NVIDIA driver reports back that "Maxwell" GPUs supports the feature, however Oxide Games made a benchmark to showcase the power of DX 12 and ran into anomalies. When they enable async shader support the result for Maxwell based product was in their own words "unmitigated disaster". With communication back and forth with NVIDIA trying to fix things Xoide learned that "Maxwell" architecture really doesn't really support async shaders at a tier 1 level, the NVIDIA driver might report it, but it apparently isn't working at a low-level hardware. At that stage NVIDIA started pressuring Oxide to remove parts of its code that use the feature altogether, they claim Oxide:
Oxide: Nvidia GPU's do not support DX12 Asynchronous Compute/Shaders.
The interest in this subject is higher then we thought. The primary evolution of the benchmark is for our own internal testing, so it's pretty important that it be representative of the gameplay. To keep things clean, I'm not going to make very many comments on the concept of bias and fairness, as it can completely go down a rat hole.
Certainly I could see how one might see that we are working closer with one hardware vendor then the other, but the numbers don't really bare that out. Since we've started, I think we've had about 3 site visits from NVidia, 3 from AMD, and 2 from Intel ( and 0 from Microsoft, but they never come visit anyone ;(). Nvidia was actually a far more active collaborator over the summer then AMD was, If you judged from email traffic and code-checkins, you'd draw the conclusion we were working closer with Nvidia rather than AMD wink.gif As you've pointed out, there does exist a marketing agreement between Stardock (our publisher) for Ashes with AMD. But this is typical of almost every major PC game I've ever worked on (Civ 5 had a marketing agreement with NVidia, for example). Without getting into the specifics, I believe the primary goal of AMD is to promote D3D12 titles as they have also lined up a few other D3D12 games.
If you use this metric, however, given Nvidia's promotions with Unreal (and integration with Gameworks) you'd have to say that every Unreal game is biased, not to mention virtually every game that's commonly used as a benchmark since most of them have a promotion agreement with someone. Certainly, one might argue that Unreal being an engine with many titles should give it particular weight, and I wouldn't disagree. However, Ashes is not the only game being developed with Nitrous. It is also being used in several additional titles right now, the only announced one being the Star Control reboot. (Which I am super excited about! But that's a completely other topic wink.gif).
Personally, I think one could just as easily make the claim that we were biased toward Nvidia as the only 'vendor' specific code is for Nvidia where we had to shutdown async compute. By vendor specific, I mean a case where we look at the Vendor ID and make changes to our rendering path. Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster in terms of performance and conformance so we shut it down on their hardware. As far as I know, Maxwell doesn't really have Async Compute so I don't know why their driver was trying to expose that. The only other thing that is different between them is that Nvidia does fall into Tier 2 class binding hardware instead of Tier 3 like AMD which requires a little bit more CPU overhead in D3D12, but I don't think it ended up being very significant. This isn't a vendor specific path, as it's responding to capabilities the driver reports.
From our perspective, one of the surprising things about the results is just how good Nvidia's DX11 perf is. But that's a very recent development, with huge CPU perf improvements over the last month. Still, DX12 CPU overhead is still far far better on Nvidia, and we haven't even tuned it as much as DX11. The other surprise is that of the min frame times having the 290X beat out the 980 Ti (as reported on Ars Techinica). Unlike DX11, minimum frame times are mostly an application controlled feature so I was expecting it to be close to identical. This would appear to be GPU side variance, rather then software variance. We'll have to dig into this one.
I suspect that one thing that is helping AMD on GPU performance is D3D12 exposes Async Compute, which D3D11 did not. Ashes uses a modest amount of it, which gave us a noticeable perf improvement. It was mostly opportunistic where we just took a few compute tasks we were already doing and made them asynchronous, Ashes really isn't a poster-child for advanced GCN features.
Our use of Async Compute, however, pales with comparisons to some of the things which the console guys are starting to do. Most of those haven't made their way to the PC yet, but I've heard of developers getting 30% GPU performance by using Async Compute. Too early to tell, of course, but it could end being pretty disruptive in a year or so as these GCN built and optimized engines start coming to the PC. I don't think Unreal titles will show this very much though, so likely we'll have to wait to see. Has anyone profiled Ark yet?
In the end, I think everyone has to give AMD alot of credit for not objecting to our collaborative effort with Nvidia even though the game had a marketing deal with them. They never once complained about it, and it certainly would have been within their right to do so. (Complain, anyway, we would have still done it, wink.gif)
--
P.S. There is no war of words between us and Nvidia. Nvidia made some incorrect statements, and at this point they will not dispute our position if you ask their PR. That is, they are not disputing anything in our blog. I believe the initial confusion was because Nvidia PR was putting pressure on us to disable certain settings in the benchmark, when we refused, I think they took it a little too personally.
AFAIK, Maxwell doesn't support Async Compute, at least not natively. We disabled it at the request of Nvidia, as it was much slower to try to use it then to not. Weather or not Async Compute is better or not is subjective, but it definitely does buy some performance on AMD's hardware. Whether it is the right architectural decision for Maxwell, or is even relevant to it's scheduler is hard to say.
Findings like these raise valid questions. Like - What does the NVIDIA driver really report? NVIDIA drivers report support DirectX 12 feature-level 12_1 in Windows, but how real is that ? It that true hardware support or as Oxide claims, semi support with a trick or two ? Meanwhile this stuff is pure gold for AMD, and they have jumped onto it as their GCN 1.2 architecture does support the feature properly. How important it will be in future games remains to be seen and as such is a trivial question. Regardless it's good news for AMD as it really is a pretty big DX12 API feature if you ask me. The updated slide slide deck from AMD can be download at the link below. Obviously AMD is plastering this DX12 feature level all over the web to their advantage, please do take the perf slides with a grain of salt.
Download AMD slide deck
Bookmarks