DirectX 12 Agility SDK 1.716.0-preview Explained

Sun
02
Feb 2025

On January 30th 2025 Microsoft released a new version of DirectX 12 Agility SDK: 1.615.0 (D3D12SDKVersion = 615) and 1.716.0-preview (D3D12SDKVersion = 716). The main article announcing this release is: AgilitySDK 1.716.0-preview and 1.615-retail. Files are available to download from DirectX 12 Agility SDK Downloads, as always, in form of .nupkg files (which are really ZIP archives).

I can see several interesting additions in the new SDK, so in this article I am going to describe them and delve into details of some of them. This way, I aim to consolidate information that is scattered across multiple Microsoft pages and provide links to all of them. The article is intended for advanced programmers who use DirectX 12 and are interested in the latest developments of the API and its surrounding ecosystem, including features that are currently in preview mode and will be included in future retail versions.

Shader hash bypass

This is the only feature added to both the retail and preview versions of the new SDK. The article announcing it is: Agility SDK 1.716.0-preview & 1.615-retail: Shader hash bypass. A more extensive article explaining this feature is available here: Validator Hashing.

The problem:

If you use DirectX 12, you most likely know that shaders are compiled in two stages. First, the source code in HLSL (High-Level Shading Language) is compiled using the Microsoft DXC compiler into an intermediate binary code. This often happens offline when the application is built. The intermediate form is commonly referred to as DXBC (as the container format and the first 4 bytes of the file) or DXIL (as the intermediate language of the shader code, somewhat similar to SPIR-V or LLVM IR). This intermediate code is then passed to a DirectX 12 function that creates a Pipeline State Object (PSO), such as ID3D12Device::CreateGraphicsPipelineState. During this step, the second stage of compilation occurs within the graphics driver, converting the intermediate code into machine code (ISA) specific to the GPU. I described this process in more detail in my article Shapes and forms of DX12 root signatures, specifically in the "Shader Compilation" section.

What you may not know is that the intermediate compiled shader blob is digitally signed by the DXC compiler using a hash embedded within it. This hash is then validated during PSO creation, and the function fails if the hash doesn’t match. Moreover, despite the DXC compiler being open source and hosted on github.com/microsoft/DirectXShaderCompiler, the signing process is handled by a separate library, "dxil.dll", which is not open source.

If you only use the DXC compiler provided by Microsoft, you may never encounter any issues with this. I first noticed this problem when I accidentally used "dxc.exe" from the Vulkan SDK instead of the Windows SDK to compile my shaders. This happened because the Vulkan SDK appeared first in my "PATH" environment variable. My shaders compiled successfully, but since the closed-source "dxil.dll" library is not distributed with the Vulkan SDK, they were not signed. As a result, I couldn’t create PSO objects from them. As the ecosystem of graphics APIs continues to grow, this could also become a problem for libraries and tools that aim to generate DXIL code directly, bypassing the HLSL source code and DXC compiler. Some developers have even reverse-engineered the signing algorithm to overcome this obstacle, as described by Stephen Gutekanst / Hexops in this article: Building the DirectX shader compiler better than Microsoft?.

The solution:

With this new SDK release, Microsoft has made two significant changes:

Technologies that generate DXIL shader code can now use either of these methods to produce a valid shader.

The capability to check whether this new feature is supported is exposed through D3D12_FEATURE_DATA_BYTECODE_BYPASS_HASH_SUPPORTED::Supported. However, it appears to be implemented entirely at the level of the Microsoft DirectX runtime rather than the graphics driver, as it returns TRUE on every system I tested.

One caveat is that "dxil.dll" not only signs the shader but also performs some form of validation. Microsoft didn’t want to leave developers without the ability to validate their shaders when using the bypass hash. To address this, they have now integrated the validation code into the D3D Debug Layer, allowing shaders to be validated as they are passed to the PSO creation function.

Tight alignment of resources

This feature is only available in the preview SDK version. The article announcing it is: Agility SDK 1.716.0-preview: Tight Alignment of Resources.

The problem:

This one is particularly interesting to me, as I develop the D3D12 Memory Allocator and Vulkan Memory Allocator libraries, which focus on GPU memory management. In DirectX 12, buffers require alignment to 64 KB, which can be problematic and lead to significant memory waste when creating a large number of very small buffers. I previously discussed this issue in my older article: Secrets of Direct3D 12: Resource Alignment.

The solution:

This is one of many features that the Vulkan API got right, and Microsoft is now aligning DirectX 12 in the same direction. In Vulkan, developers need to query the required size and alignment of each resource using functions like vkGetBufferMemoryRequirements, and the driver can return a small alignment if supported. For more details, you can refer to my older article: Differences in memory management between Direct3D 12 and Vulkan. Microsoft is now finally allowing buffers in DirectX 12 to support smaller alignments by introducing the following new API elements:

I have already implemented support for this new feature in the D3D12MA library. Since this is a preview feature, I’ve done so on a separate branch for now. You can find it here: D3D12MemoryAllocator branch resource-tight-alignment.

This feature requires support from the graphics driver, and as of today, no drivers support it yet. The announcement article mentions that AMD plans to release a supporting driver in early February, while other GPU vendors are also interested and will support it in an "upcoming driver" or at some indefinite point in the future - similar to other preview features described below.

However, testing is possible right now using the software (CPU) implementation of DirectX 12 called WARP. Here’s how you can set it up:

Microsoft has also shared a sample application to test this feature: DirectX-Graphics-Samples - HelloTightAlignment.

Application specific driver state

This feature is only available in the preview SDK version. The article announcing it is: Agility SDK 1.716.0-preview: Application Specific Driver State. It is intended for capture-replay tools rather than general usage in applications.

The problem:

A graphics API like Direct3D or Vulkan serves as a standardized contract between a game, game engine, or other graphics application, and the graphics driver. In an ideal world, every application that correctly uses the API would work seamlessly with any driver that correctly implements the API. However, we know that software is far from perfect and often contains bugs, which can exist on either side of the API: in the application or in the graphics driver.

It’s no secret that graphics drivers often detect specific popular or problematic games and applications to apply tailored settings to them. These settings might include tweaks to the DirectX 12 driver or the shader compiler, for example. Such adjustments can improve performance in cases where default heuristics are not optimal for a particular application or shader, or they can provide workarounds for known bugs.

For the driver to detect a specific application, it would be helpful to pass some form of application identification. Vulkan includes this functionality in its core API through the VkApplicationInfo structure, where developers can provide the application name, engine name, application version, and engine version. DirectX 12, however, lacks this feature. The AMD GPU Services (AGS) library adds this capability with the AGSDX12ExtensionParams structure, but this is specific to AMD and not universally adopted by all applications.

Because of this limitation, DirectX 12 drivers must rely on detecting applications solely by their .exe file name. This can cause issues with capture-replay tools such as PIX on Windows, RenderDoc or GFXReconstruct. These tools attempt to replay the same sequence of DirectX 12 calls but use a different executable name, which means driver workarounds are not applied.

Interestingly, there is a workaround for PIX that you can try if you encounter issues opening or analyzing a capture:

  1. Rename the 'WinPixEngineHost.exe" file to match the name of the original application, such as "ThatGreatGame.exe".
  2. Create a file system link called "WinPixEngineHost.exe" pointing to that new file: mklink WinPixEngineHost.exe ThatGreatGame.exe
  3. Launch PIX, open the capture, and start the Analysis.

This way, PIX will use "WinPixEngineHost.exe" to launch the DirectX 12 workload, but the driver will see the original executable name. This ensures that the app-specific profile is applied, which may resolve the issue.

The solution:

With this new SDK release, Microsoft introduces an API to retrieve and apply an "application-specific driver state." This state will take the form of an opaque blob of binary data. With this feature and a supporting driver, capture-replay tools will hopefully be able to instruct the driver to apply the same app-specific profile and workarounds when replaying a recorded graphics workload as it would for the original application - even if the executable file name of the replay tool is different. This means that workarounds like the one described above will no longer be necessary.

The support for this feature can be queried using D3D12_FEATURE_DATA_APPLICATION_SPECIFIC_DRIVER_STATE::Supported. Since this feature is intended for tools rather than typical graphics applications, I won’t delve into further details here.

Recreate at GPUVA

This feature is only available in the preview SDK version. The article announcing it is: Agility SDK 1.716.0-preview: Recreate At GPUVA. It is intended for capture-replay tools rather than general usage in applications.

The problem:

Graphics APIs are gradually moving toward the use of free-form pointers, known as GPU Virtual Addresses (GPUVA). If such pointers are embedded in buffers, capture-replay tools may struggle to replay the workload accurately, as the addresses of the resources may differ in subsequent runs. Microsoft mentions that in PIX, they intercept the indirect argument buffer used for ExecuteIndirect to patch these pointers, but this approach may not always be fully reliable.

The solution:

With this new SDK release, Microsoft introduces an API to retrieve the address of a resource and to request the creation of a new resource at a specific address. To ensure that no other resources are assigned to the intended address beforehand, there will also be an option to reserve a list of GPUVA address ranges before creating a Direct3D 12 device.

The support for this feature can be queried using D3D12_FEATURE_DATA_D3D12_OPTIONS20::RecreateAtTier. Since this feature is intended for tools rather than typical graphics applications, I won’t delve into further details here.

This is yet another feature that Vulkan already provides, while Microsoft is only now adding it. In Vulkan, the ability to recreate resources at a specific address was introduced alongside the VK_KHR_buffer_device_address extension, which introduced free-form pointers. This functionality is provided through "capture replay" features, such as the VkBufferOpaqueCaptureAddressCreateInfo structure.

Runtime bypass

This feature works automatically and does not introduce any new API. It improves performance by passing some DirectX 12 function calls directly to the graphics driver, bypassing intermediate functions in Microsoft’s DirectX 12 runtime code.

If I understand it correctly, this appears to be yet another feature that Vulkan got right, and Microsoft is now catching up. For more details, see the article Architecture of the Vulkan Loader Interfaces, which describes how dynamically fetching pointers to Vulkan functions using vkGetInstanceProcAddr and vkGetDeviceProcAddr can point directly to the "Installable Client Driver (ICD)," bypassing "trampoline functions."

Additional considerations

There are also some additions to D3D12 Video. The article announcing them is: Agility SDK 1.716.0-preview: New D3D12 Video Encode Features. However, since I don’t have much expertise in D3D12 Video, I won’t describe them here.

Microsoft also released new versions of PIX that support all these new features from day 0! See the announcement article for PIX version 2501.30 and 2501.30-preview.

Queries for the new capabilities added in this update to the Agility SDK (both retail and preview versions) have already been integrated into the D3d12info command-line tool, the D3d12infoGUI tool, and the D3d12infoDB online database of DX12 GPU capabilities. You can contribute to this project by running the GUI tool and submitting your GPU’s capabilities to the database!

Comments | #directx #rendering Share

Comments

[Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2025