First Look at New D3D12 Enhanced Barriers

Thu
09
Dec 2021

This will be pretty advanced or at least intermediate article. It assumes you know Direct3D 12 API. Some references to Vulkan may also appear. I am writing it because I just found out that yesterday Microsoft announced an upcoming big change in D3D12: Enhanced Barriers. It will be an addition to the API that provides a new way to do barriers. Considering my professional interests, this looks very important to me and also quite revolutionary. This article summarizes my first look and my thoughts about this new addition to the API or, speaking in terms of modern internet, my "unboxing" or "reaction" ;)

Bill Kristiansen, the author of the article linked above, written that currently only the software-simulated WARP device supports the new enhanced barriers. Support in real GPU drivers will come at later time. The new barriers can replace the old way of doing them, but both will still be available and can also be mixed in one application. Which means this is not as big revolution to turn our DirectX development upside down - we can switch to them gradually. For now we can just prepare ourselves for the future by studying the interface (which I do in this article) and testing some code using WARP device.

UPDATE 2021-12-10: I just learned that Microsoft actually did publish a documentation of the new API: Enhanced Barriers @ DirectX-Specs, so I recommend to go see it before reading this article.

A quick recap of what is a barrier: In the new graphics APIs (Direct3D 12 and Vulkan) this is a type of command recorded to a command list / command buffer to tell about the previous and next usage of some resource (buffer or texture / image). For example, if one draw call writes to a texture as a render target (color attachment) and the next one reads from it in a pixel shader as a shader resource (sampled image), we need to issue a barrier between these two draw calls expressing exactly what I just described. A barrier is quite a low-level concept, non-existent in the old and higher-level APIs (D3D9, D3D11, OpenGL), but still “virtual” and high level enough to actually encapsulate 3 different things that may happen under the hood:

  1. Synchronization. The most intuitive meaning of a “barrier”. Possibly all draw calls before it have to complete before any draw call after it can start executing to avoid data hazards, like read after write (RAW) hazard in the case described above. This may be optimized to wait only at a specific stage of the graphics pipeline, e.g. in this example vertex shader of the second draw call can already start executing, only pixel shader needs to wait.
  2. Cache management. Unlike CPUs, GPUs can have different kinds of caches e.g. a separate one for render target writes, another one for texture sampling, and they may not be coherent, so a flush of the first one and an invalidation of the other one must be done to make sure that correct data are passed.
  3. Texture layout transition. Textures may use some internal, opaque, GPU-specific compression format (see my article: “Texture Compression: What Can It Mean?”). This is especially applicable to render targets and depth-stencil textures. The data of the texture may need to be converted to another form before it can be used for sampling.

Microsoft didn’t seem to publish a documentation of the new API anywhere yet. They only linked to the new Direct3D Agility SDK NuGet package (Microsoft.Direct3D.D3D12 1.700.10-preview). Fortunately, I don't need to use Windows Insider Program for Developers or install that package to see what’s inside. I could just download the file "microsoft.direct3d.d3d12.1.700.10-preview.nupkg", unpack it (as it really is just a ZIP file), and look into file “build\native\include\d3d12.h” to see what the future looks like :) Here is my brief analysis of what I found there. I hope I’m not committing a crime by quoting some API code here. After all, it is published on the internet already, just in a more convoluted form.

struct D3D12_FEATURE_DATA_D3D12_OPTIONS12 {
    ...
    BOOL EnhancedBarriersSupported;
};

First, we have another, 12th version of the FEATURE_DATA_D3D12_OPTIONS structure that describes the capabilities of the current GPU. Among the members, we can see a boolean telling whether enhanced barriers are supported. Let’s hope that (all three) PC GPU vendors update their drivers with proper support so that soon all newly released DX12 games could just use the new API instead of implementing two different code paths, and, more importantly, that this whole thing will also work in Windows 10, not only in Windows 11 as the first linked article mentions!

struct ID3D12GraphicsCommandList7 {
    void Barrier(UINT32 NumBarrierGroups,
        const D3D12_BARRIER_GROUP *pBarrierGroups);
};

We also have a new, 7th generation of the command list class which introduces just this new barrier function. Like with the old one, we can issue multiple barriers at once and batching them is most likely recommended for better performance, as drivers for the new APIs have permission to be dumb and may not optimize multiple subsequent calls.

struct D3D12_BARRIER_GROUP {
    D3D12_BARRIER_TYPE Type;
    UINT32 NumBarriers;
    union {
        const D3D12_GLOBAL_BARRIER *pGlobalBarriers;
        const D3D12_TEXTURE_BARRIER *pTextureBarriers;
        const D3D12_BUFFER_BARRIER *pBufferBarriers;
        const D3D12_RESOURCE_STATE_BARRIER *pStateBarriers;
    };
};

This is how the “root” structure of a new barrier looks like. It is actually called a “barrier group”, so there are two levels of indirection here – Barrier function takes an array of barrier groups, each containing an array of actual barriers, all having one of 4 possible types.

enum D3D12_BARRIER_TYPE {
    D3D12_BARRIER_TYPE_GLOBAL,
    D3D12_BARRIER_TYPE_TEXTURE,
    D3D12_BARRIER_TYPE_BUFFER,
};

Not surprisingly, the enum used as first member of the structure above just indicates which of the union members is valid. What’s surprising is the lack of the 4th one – there is no D3D12_BARRIER_TYPE_RESOURCE_STATE. I can’t see any explanation for it other than an accidental omission.

Now, let’s see how the 4 types of new barriers look like.

struct D3D12_GLOBAL_BARRIER {
    D3D12_BARRIER_SYNC SyncBefore;
    D3D12_BARRIER_SYNC SyncAfter;
    D3D12_BARRIER_ACCESS AccessBefore;
    D3D12_BARRIER_ACCESS AccessAfter;
};

1. First one is the simplest as it doesn’t mention any specific buffer or texture. All we have here are two pairs of flags that indicate preceding and following “Sync” and “Access”. This is clearly an equivalent of the Global Memory Barrier in Vulkan (VkMemoryBarrier2KHR). By the way, I like the Khronos’s names “src” and “dst” more than Microsoft’s “Before” and “After” because they are shorter and have same length, but “Before” and “After” is the convention that D3D API used before.

enum D3D12_BARRIER_SYNC {
    D3D12_BARRIER_SYNC_NONE,
    D3D12_BARRIER_SYNC_ALL,
    D3D12_BARRIER_SYNC_DRAW,
    D3D12_BARRIER_SYNC_INPUT_ASSEMBLER,
    D3D12_BARRIER_SYNC_VERTEX_SHADING,
    D3D12_BARRIER_SYNC_PIXEL_SHADING,
    D3D12_BARRIER_SYNC_DEPTH_STENCIL,
    D3D12_BARRIER_SYNC_RENDER_TARGET,
    D3D12_BARRIER_SYNC_COMPUTE_SHADING,
    D3D12_BARRIER_SYNC_RAYTRACING,
    D3D12_BARRIER_SYNC_COPY,
    D3D12_BARRIER_SYNC_RESOLVE,
    D3D12_BARRIER_SYNC_EXECUTE_INDIRECT,
    D3D12_BARRIER_SYNC_PREDICATION,
    D3D12_BARRIER_SYNC_ALL_SHADING,
    D3D12_BARRIER_SYNC_NON_PIXEL_SHADING,
    D3D12_BARRIER_SYNC_EMIT_RAYTRACING_ACCELERATION_STRUCTURE_POSTBUILD_INFO,
    D3D12_BARRIER_SYNC_VIDEO_DECODE,
    D3D12_BARRIER_SYNC_VIDEO_PROCESS,
    D3D12_BARRIER_SYNC_VIDEO_ENCODE,
    D3D12_BARRIER_SYNC_BUILD_RAYTRACING_ACCELERATION_STRUCTURE,
    D3D12_BARRIER_SYNC_COPY_RAYTRACING_ACCELERATION_STRUCTURE,
    D3D12_BARRIER_SYNC_SPLIT,
};

The first “Sync” flag looks somewhat like the list of stages in the graphics pipeline. Each value has its own bit, so likely a combination of multiple bits will be allowed. As its name suggests, it deals with synchronization – tells which stages of the pipeline have to wait to avoid hazard. I guess D3D12_BARRIER_SYNC_ALL will be the always-valid, all-encompassing, most conservative one, causing a stall of the entire GPU (or at least the current queue).

This is clearly an equivalent of VkPipelineStageFlags2KHR. One big difference from Vulkan is that we don’t have “tessellation”, “geometry shader” and all the programmable stages listed, just PIXEL_SHADING and NON_PIXEL_SHADING, but it was the same with the old D3D12 barriers (plus here we have VERTEX and COMPUTE).

enum D3D12_BARRIER_ACCESS {
    D3D12_BARRIER_ACCESS_COMMON,
    D3D12_BARRIER_ACCESS_VERTEX_BUFFER,
    D3D12_BARRIER_ACCESS_CONSTANT_BUFFER,
    D3D12_BARRIER_ACCESS_INDEX_BUFFER,
    D3D12_BARRIER_ACCESS_RENDER_TARGET,
    D3D12_BARRIER_ACCESS_UNORDERED_ACCESS,
    D3D12_BARRIER_ACCESS_DEPTH_STENCIL_WRITE,
    D3D12_BARRIER_ACCESS_DEPTH_STENCIL_READ,
    D3D12_BARRIER_ACCESS_SHADER_RESOURCE,
    D3D12_BARRIER_ACCESS_STREAM_OUTPUT,
    D3D12_BARRIER_ACCESS_INDIRECT_ARGUMENT,
    D3D12_BARRIER_ACCESS_PREDICATION,
    D3D12_BARRIER_ACCESS_COPY_DEST,
    D3D12_BARRIER_ACCESS_COPY_SOURCE,
    D3D12_BARRIER_ACCESS_RESOLVE_DEST,
    D3D12_BARRIER_ACCESS_RESOLVE_SOURCE,
    D3D12_BARRIER_ACCESS_RAYTRACING_ACCELERATION_STRUCTURE_READ,
    D3D12_BARRIER_ACCESS_RAYTRACING_ACCELERATION_STRUCTURE_WRITE,
    D3D12_BARRIER_ACCESS_SHADING_RATE_SOURCE,
    D3D12_BARRIER_ACCESS_VIDEO_DECODE_READ,
    D3D12_BARRIER_ACCESS_VIDEO_DECODE_WRITE,
    D3D12_BARRIER_ACCESS_VIDEO_PROCESS_READ,
    D3D12_BARRIER_ACCESS_VIDEO_PROCESS_WRITE,
    D3D12_BARRIER_ACCESS_VIDEO_ENCODE_READ,
    D3D12_BARRIER_ACCESS_VIDEO_ENCODE_WRITE,
    D3D12_BARRIER_ACCESS_NO_ACCESS
};

The second pair of flags is about “Access”. These values also have separate bits, so possibly we will be able to combine them as well. They mention different ways that a resource can be accessed. It is clearly different from the “Sync”. For example, a buffer used by PIXEL_SHADING could be bound as a CONSTANT_BUFFER, SHADER_RESOURCE, or UNORDERED_ACCESS. We have to be more specific here because, most likely, this is about telling the driver which caches we need to flush or invalidate, in case of a GPU that has many of them serving different purposes and not coherent between one another. This is equivalent of Vulkan VkAccessFlags2KHR.

struct D3D12_BUFFER_BARRIER {
    D3D12_BARRIER_SYNC SyncBefore;
    D3D12_BARRIER_SYNC SyncAfter;
    D3D12_BARRIER_ACCESS AccessBefore;
    D3D12_BARRIER_ACCESS AccessAfter;
    ID3D12Resource *pResource;
    UINT64 Offset;
    UINT64 Size;
};

2. A Buffer Barrier is similar to a global barrier, except it also mentions a specific resource (which must be a buffer I assume) and even a specific range of bytes inside of it. I want to believe drivers will make use of this information to perform more fine-grained synchronization rather than stalling the entire GPU and flushing entire cache…

Note there is no “state” to transition the buffer to, just parameters needed to perform a one-time synchronization event. This is a big step forward to simplify resource management, as Resource States form the current D3D12 API cause unnecessary burden. After all, a buffer is just a bunch of bytes, no internal compression formats or other optimization tricks involved!

struct D3D12_TEXTURE_BARRIER {
    D3D12_BARRIER_SYNC SyncBefore;
    D3D12_BARRIER_SYNC SyncAfter;
    D3D12_BARRIER_ACCESS AccessBefore;
    D3D12_BARRIER_ACCESS AccessAfter;
    D3D12_BARRIER_LAYOUT LayoutBefore;
    D3D12_BARRIER_LAYOUT LayoutAfter;
    ID3D12Resource *pResource;
    D3D12_BARRIER_SUBRESOURCE_RANGE Subresources;
    D3D12_TEXTURE_BARRIER_FLAGS Flags;
};
struct D3D12_BARRIER_SUBRESOURCE_RANGE {
    UINT IndexOrFirstMipLevel;
    UINT NumMipLevels;
    UINT FirstArraySlice;
    UINT NumArraySlices;
    UINT FirstPlane;
    UINT NumPlanes;
};

3. A Texture Barrier is also similar. Obviously, textures are more complex than buffers, so instead of a range of bytes we specify a range of mip levels, array slices, and volume planes to affect – an equivalent of VkImageMemoryBarrier2KHR. Other than that, we have two additional parameters – Flags and LayoutBefore + LayoutAfter.

enum D3D12_TEXTURE_BARRIER_FLAGS {
    D3D12_TEXTURE_BARRIER_FLAG_NONE,
    D3D12_TEXTURE_BARRIER_FLAG_DISCARD,
};

Flags offer just one flag, but it is a big one. It will allow to “discard”, so to properly initialize metadata of otherwise garbage texture memory after a fresh allocation or aliasing. With the old barriers, it was a tricky problem that I extensively described on my blog. If the memory can contain garbage data, you need to initialize it with either Clear, DiscardResource, or a copy – see my article “Initializing DX12 Textures After Allocation and Aliasing”. But in order to do that, the texture needs to be in a proper state, so you need to issue a barrier transitioning it to that state – see my other article “States and Barriers of Aliasing Render Targets”. Where to place that barrier? Is it valid to issue it right before the Discard operation, operating on random data? This creates a vicious cycle. With this new little flag the whole problem goes away – you can do a barrier and Discard in one call, exactly like in Vulkan by transitioning an image from VK_IMAGE_LAYOUT_UNDEFINED.

enum D3D12_BARRIER_LAYOUT {
    D3D12_BARRIER_LAYOUT_UNDEFINED,
    D3D12_BARRIER_LAYOUT_COMMON,
    D3D12_BARRIER_LAYOUT_PRESENT,
    D3D12_BARRIER_LAYOUT_GENERIC_READ,
    D3D12_BARRIER_LAYOUT_RENDER_TARGET,
    D3D12_BARRIER_LAYOUT_UNORDERED_ACCESS,
    D3D12_BARRIER_LAYOUT_DEPTH_STENCIL_WRITE,
    D3D12_BARRIER_LAYOUT_DEPTH_STENCIL_READ,
    D3D12_BARRIER_LAYOUT_SHADER_RESOURCE,
    D3D12_BARRIER_LAYOUT_COPY_SOURCE,
    D3D12_BARRIER_LAYOUT_COPY_DEST,
    D3D12_BARRIER_LAYOUT_RESOLVE_SOURCE,
    D3D12_BARRIER_LAYOUT_RESOLVE_DEST,
    D3D12_BARRIER_LAYOUT_SHADING_RATE_SOURCE,
    D3D12_BARRIER_LAYOUT_VIDEO_DECODE_READ,
    D3D12_BARRIER_LAYOUT_VIDEO_DECODE_WRITE,
    D3D12_BARRIER_LAYOUT_VIDEO_PROCESS_READ,
    D3D12_BARRIER_LAYOUT_VIDEO_PROCESS_WRITE,
    D3D12_BARRIER_LAYOUT_VIDEO_ENCODE_READ,
    D3D12_BARRIER_LAYOUT_VIDEO_ENCODE_WRITE,
    D3D12_BARRIER_LAYOUT_DIRECT_QUEUE_COMMON,
    D3D12_BARRIER_LAYOUT_DIRECT_QUEUE_GENERIC_READ,
    D3D12_BARRIER_LAYOUT_DIRECT_QUEUE_UNORDERED_ACCESS,
    D3D12_BARRIER_LAYOUT_DIRECT_QUEUE_SHADER_RESOURCE,
    D3D12_BARRIER_LAYOUT_DIRECT_QUEUE_COPY_SOURCE,
    D3D12_BARRIER_LAYOUT_DIRECT_QUEUE_COPY_DEST,
    D3D12_BARRIER_LAYOUT_COMPUTE_QUEUE_COMMON,
    D3D12_BARRIER_LAYOUT_COMPUTE_QUEUE_GENERIC_READ,
    D3D12_BARRIER_LAYOUT_COMPUTE_QUEUE_UNORDERED_ACCESS,
    D3D12_BARRIER_LAYOUT_COMPUTE_QUEUE_SHADER_RESOURCE,
    D3D12_BARRIER_LAYOUT_COMPUTE_QUEUE_COPY_SOURCE,
    D3D12_BARRIER_LAYOUT_COMPUTE_QUEUE_COPY_DEST,
    D3D12_BARRIER_LAYOUT_VIDEO_QUEUE_COMMON,
};

With textures, we have a 3rd pair of flags, called Before/After "Layout" - clearly an equivalent of VkImageLayout. Because textures can use some internal compression format, a barrier may need to perform a conversion of its full content to make it compatible with the next usage. Once again we have different ways of using a texture listed here, but this time the values of the enum are subsequent numbers not bit flags, so only one can be valid at a time. This would be the only "stateful" part of the new API - I guess the LayoutAfter of Barrier A must match the LayoutBefore of Barrier B on the same texture and it will be validated by the Debug Layer.

Note how we have separate flags for direct queue and compute queue here. I guess this will be the (much simpler and nicer) equivalent of Vulkan "queue family ownership transfer barriers".

struct D3D12_RESOURCE_STATE_BARRIER {
    D3D12_RESOURCE_STATES State;
    ID3D12Resource *pResource;
    UINT Subresource;
    D3D12_BARRIER_SYNC Sync;
    D3D12_BARRIER_ACCESS Access;
    D3D12_BARRIER_LAYOUT Layout;
};

4. This is not something I fully understand. It is clearly about a barrier affecting a specific resource or its subresource. On one hand, we have the old D3D12_RESOURCE_STATES flags. On the other hand, we have a single set of the new Sync, Access, and Layout flags. It could mean a transition of the resource from the old to the new “world” (or the other way around?), to allow mixing both APIs in one application.

If the old D3D12_RESOURCE_STATES is to retire, we also need a new way to specify InitialState when creating resources. For this, we have a new class ID3D12Device10 with methods: CreateCommittedResource3, CreatePlacedResource2, CreateReservedResource2 that take this parameter as:

D3D12_BARRIER_LAYOUT InitialLayout,

This is it – probably the first description of the new D3D12 Enhanced Barriers API. Now it is time for my opinion. I must admit that during analysis of this whole thing I changed my mind. Overall I like DX12 slightly more than Vulkan because it’s a bit simpler API and so I always thought that barriers in DX12 are better because they are simpler. Using 3 pairs of flags to describe a barrier in Vulkan – “src” / “dst” X “access” / “stage” / “layout” – looked to me like unnecessary complication, especially after seeing how Simplified Vulkan Synchronization library is able to simplify it to a single enum. But now, after realizing that the new DX12 barriers are pretty much a copy of Vulkan API, it is almost like Microsoft said “Vulkan got it right” and I agree with them. If we recall the 3 roles of barriers mentioned at the beginning of this article, we can now see how this API is more explicit and more natural in expressing them. It also solves some problems that are haunting D3D12 programmers using current approach to barriers:

  1. Problem with texture initialization using Clear, DiscardResource, or Copy – described above.
  2. Impossibility to reuse pieces of one big buffer for different purposes at the same time, e.g. reading one part as a vertex buffer while streaming another part as copy destination without violating the rules, as entire buffer has to be in either one “write” state or a combination or “read” states.
  3. Implicit state promotion and decay – a complex set of additional rules that will now go away completely.

One thing I can’t find in the new API is split barriers. There is this flag D3D12_BARRIER_SYNC_SPLIT, but I can’t imagine how is it going to replace the pair of D3D12_RESOURCE_BARRIER_FLAG_BEGIN_ONLY + D3D12_RESOURCE_BARRIER_FLAG_END_ONLY from the old one. But do we really need this additional complication that split barriers bring? Do GPUs really run faster when we use them?

Comments | #directx #vulkan #gpu Share

Comments

[Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2022