Entries for tag "directx", ordered from most recent. Entry count: 91.
# Secrets of Direct3D 12: Do RTV and DSV descriptors make any sense?
This article is intended for programmers who use Direct3D 12. We will explore the topic of descriptors, especially Render Target View (RTV) and Depth Stencil View (DSV) descriptors. To understand the article, you should already know what they are and how to use them. For learning the basics, I recommend my earlier article “Direct3D 12: Long Way to Access Data” where I described resource binding model in D3D12. Current article is somewhat a follow-up to that one. I also recommend checking the official “D3D12 Resource Binding Functional Spec”.
What is a “descriptor”? My personal definition would be that generally in computing, a descriptor is a small data structure that points to some larger data and describes its parameters. While a “pointer”, “identifier”, or “key” is typically just a single number that points or identifies the main object, a “descriptor” is typically a structure that also carries some parameters describing the object.
Descriptors in D3D12 are also called “views”. They mean the same thing. Functions like
CreateRenderTargetView setup a descriptor. Note this is different from Vulkan, where a “view” and a “descriptor” are different entities. The concept of “view” is also present in relational databases. Just like in databases, a “view” points to the target data, but also specifies a way to look at them. In D3D12 it means, for example, that an SRV descriptor pointing to a texture can reinterpret its pixel format (e.g. with or without
_SRGB), limit access to only selected range of mip levels or array slices.
Let’s talk about Constant Buffer View (CBV), Shader Resource View (SRV), or Unordered Access View (UAV) descriptors first. If created inside GPU-accessible descriptor heaps (class
D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE), they can be bound to the graphics pipeline, as I described in details in my previously mentioned article. Being part of GPU memory has some implications:
# Doing dynamic resolution scaling? Watch out for texture memory size!
This article is intended for graphics programmers, mostly those who use Direct3D 12 or Vulkan and implement dynamic resolution scaling. Before we go to the main topic, some introduction first…
Nowadays, more and more games offer some kind resolution scaling. It means rendering the 3D scene in a resolution lower than the display resolution and then upscaling it using some advanced shader, often combined with temporal antialiasing and sharpening. It may be one of the solutions provided by GPU vendors (FSR from AMD, XeSS from Intel, DLSS from NVIDIA) or a custom solution (like TSR in Unreal Engine). It is an attractive option for gamers to have a good FPS increase with only minor image quality degradation. It is becoming more important as monitor resolutions increase to 4K or even more, high-end graphics cards are still expensive, and advanced rendering techniques like ray tracing encourage to favor “better pixels” over “more pixels”. See also my old article: “Scaling is everywhere, pixel-perfect is the past”.
Dynamic resolution scaling is an extension to this idea that allows rendering each frame in a different resolution, lower or higher, as a trade-off between quality and performance, to maintain desired framerate even in more complex scenes with many objects, characters, and particle effects visible on the screen. If you are interested in this technique, I strongly recommend checking a recent article from Martin Fuller from Microsoft: “Dynamic Resolution Scaling (DRS) Implementation Best Practice”, which provides many practical implementation tips.
One of the topics we need to handle when implementing dynamic resolution scaling is the creation and usage of textures that need different resolution every frame, especially render target, depth-stencil, and UAV, used temporarily between render passes. One solution could be to create these textures in the maximum resolution and use only part of them when necessary using a limited viewport. However, Martin gives multiple reasons why this option may cause some problems. A simpler and safer solution is to create a separate texture for each possible resolution, with a certain step. In modern graphics APIs (Direct3D 12 and Vulkan) they can be placed in the same memory, which we call memory aliasing.
Here comes the main question I want to answer in this article: What size of the memory heap should we use when allocating memory for these textures? Can we just take maximum dimensions of a texture (e.g. 4K resolution: 3840 x 2160), call
device->GetResourceAllocationInfo(), inspect returned
D3D12_RESOURCE_ALLOCATION_INFO::SizeInBytes and use it as
D3D12_HEAP_DESC::SizeInBytes? A texture with less pixels should always require less memory, right?
WRONG! Direct3D 12 doesn’t define such a requirement and graphics drivers from some GPU vendors really return smaller size required for a texture with larger dimensions, for some specific dimensions and pixel formats. For example, on AMD Radeon RX 7900 XTX, a render target with format
Why does this happen? It is because textures are not necessarily stored in the GPU memory in a way we imagine them: pixel-after-pixel, row major order. They often use some optimization techniques like pixel swizzling or compression. By “compression”, I don’t mean texture formats like BC or ASTC, which we must use explicitly. I also don’t mean compression like in ZIP file format or zlib/deflate algorithm that decrease data size. Quite the opposite: this kind of compression increases texture size by adding extra metadata, which allow to speed things up by saving memory bandwidth in certain cases. This is done mostly on render target and depth-stencil textures. For more information about it, see my old article: “Texture Compression: What Can It Mean?”. I’m talking about the meaning of the word “compression” number 4 from that article – compression formats that are internal, specific to certain graphics cards, and opaque for us – programmers who just use the graphics API. Problem is that a specific compression format for a texture is selected by the driver based on various heuristics (like render target / depth-stencil / UAV / other flags, pixel format, and… dimensions). This is why a texture with larger dimensions may unexpectedly require less memory.
To research this problem in details, I’ve written a small testing program and I performed tests on graphics cards from various vendors. It was a modification of my small Windows console app D3d12info that goes through the list of all
DXGI_FORMAT enum values, calls
CheckFeatureSupport to check which ones are supported as a render target or depth-stencil. For those that do, I called
GetResourceAllocationInfo to get memory requirements for a texture with this pixel format, with increasing dimensions, where height goes from 32 to 2160 with a step of 8, and width is calculated using a formula for 16:9 aspect ratio: width = height * 16 / 9.
Here are the results. Please remember these are just 3 specific graphics cards. The results may be different on a different GPU and even with a different version of the graphics driver.
On NVIDIA GeForce RTX 3080 with driver 545.84, I found no cases where a texture with larger dimensions requires less memory, so NVIDIA (or at least this specific card) is not affected by the problem described in this article.
On AMD Radeon RX 7900 XTX with driver 23.9.3, I found following data points where memory requirements are non-monotonic – one for each of the following formats:
DXGI_FORMAT_R16G16B16A16_FLOAT/UNORM/UINT/SNORM/SINT: 256x144 = 458,752 B, 270x152 = 393,216 B
DXGI_FORMAT_R32G32_FLOAT/UINT/SINT: 256x144 = 458,752 B, 270x152 = 393,216 B
DXGI_FORMAT_R8G8_UNORM/UINT/SNORM/SINT: 512x288 = 458,752 B, 526x296 = 393,216 B
DXGI_FORMAT_R16_FLOAT/UNORM/UINT/SNORM/SINT: 512x288 = 458,752 B, 526x296 = 393,216 B
DXGI_FORMAT_R8_UNORM/UINT/SNORM/SINT: 256x144 = 131,072 B, 270x152 = 65,536 B
DXGI_FORMAT_A8_UNORM: 256x144 = 131,072 B, 270x152 = 65,536 B
DXGI_FORMAT_B5G6R5_UNORM: 512x288 = 458,752 B, 526x296 = 393,216 B
DXGI_FORMAT_B5G5R5A1_UNORM: 512x288 = 458,752 B, 526x296 = 393,216 B
DXGI_FORMAT_B4G4R4A4_UNORM: 512x288 = 458,752 B, 526x296 = 393,216 B
On Intel Arc A770, with driver 184.108.40.20687, almost every format used as a render target (but none of depth-stencil formats) has multiple steps where the size decreases, and it has them at larger dimensions than AMD. For example, the most “traditional” one –
What to do with this knowledge? The conclusion is that if we implement dynamic resolution scaling and we want to create textures with different dimensions aliasing in memory, required size of this memory is not necessarily the size of the largest texture in terms of dimensions. To be safe, we should query for memory requirements of all texture sizes we may want to use and calculate their maximum. In practice, it should be enough to query resolutions starting from e.g. 75% of the maximum. Because tested GPUs always have only a single step down, an even more efficient, but not fully future-proof solution could be to start from the full resolution, go down until we find a different memory size (no matter if higher or lower), and take maximum of these two.
So far, I focused only on DirectX 12. Is Vulkan also affected by this problem? In the past, it could be. Vulkan has similar concept of querying for memory requirements of a texture using function
vkGetImageMemoryRequirements. It used to have an even bigger problem. To understand it, we must recall that in D3D12, we query for memory requirements (size and alignment) given structure
D3D12_RESOURCE_DESC which describes parameters of a texture to be created. In (the initial) Vulkan API, on the other hand, we need to first create the actual
VkImage object, and then query for its memory requirements. Question is: Given two textures created with exactly same parameters (width, height, pixel format, number of mip levels, flags, etc.), do they always return the same memory requirements?
In the past, it wasn’t required by the Vulkan specification and I saw some drivers for some GPUs that really returned different sizes for two identical textures! It could cause problems, e.g. when defragmenting video memory in Vulkan Memory Allocator library. Was it a bug, or another internal optimization done by the driver, e.g. to avoid some memory bank conflicts? I don’t know. Good news is that since then, Vulkan specification was clarified to require that functions like
vkGetImageMemoryRequirements always return the same size and alignment for images created with the same parameters, and new drivers comply with that, so the problem is gone now. Vulkan 1.3 also got a new function
vkGetDeviceImageMemoryRequirements that takes
VkImageCreateInfo with image creation parameters instead of an already created image object, just like D3D12 does from the beginning.
Going back to the main question of this article: When VK_KHR_maintenance4 extension is enabled (which has been promoted to core Vulkan 1.3), the problem does not occur, as Vulkan specification says: "For a VkImage, the size memory requirement is never greater than that of another VkImage created with a greater or equal value in each of extent.width, extent.height, and extent.depth; all other creation parameters being identical.", and the same for buffers.
Big thanks to my friends: Bartek Boczula for discussions about this topic and inspiration to write this article, as well as Szymon Nowacki for testing on the Intel card! Also thanks to Constantine Shablia from Collabora for pointing me to the answer on Vulkan.
# ShaderCrashingAssert - a New Small Library
Last Thursday (August 17th) AMD released a new tool for post-mortem analysis of GPU crashes: Radeon GPU Detective. I participated in this project, but because this is my personal blog and because it is weekend now, I am wearing my hobby developer hat and I want to present a small library that I developed yesterday:
ShaderCrashingAssert provides an assert-like macro for HLSL shaders that triggers a GPU memory page fault. Together with RGD, it can help with shader debugging.
# D3d12info - Printing D3D12 GPU Information to Console
My next little hobby project is D3d12info. It is a Windows console program that prints all the information it can get about the current GPU installed in the system, as seen through Direct3D 12 API. It also fetches additional information through AMD GPU Services (on AMD cards), NVAPI (on NVIDIA cards), Vulkan, and WinAPI, mostly to identify the current version of the graphics driver and Windows system. I will try to keep it updated to the latest Agility SDK, to query it for support for the latest hardware features of the graphics card.
The tool can be compared to DirectX Caps Viewer you can find in your Windows SDK installation under path "c:\Program Files (x86)\Windows Kits\10\bin\*\x64\dxcapsviewer.exe" in terms of the information extracted from DX12. However, instead of GUI, it provides a command-line interface, which makes it similar to the "vulkaninfo" tool. Information is printed in a human-readable text format by default, but JSON format can be selected by providing
-j parameter, making it suitable for automated processing. Additional command-line parameters are supported, including a choice of the GPU if there are many installed in the system. Launch it with parameter
-h to see the command-line syntax.
In the future, I would like to extend it with a web back-end that would gather a database of various GPUs and driver versions, like Vulkan Hardware Database does for Vulkan, and to make it browsable online. As far as I know, there is no such database for D3D12 at the moment. Best we have right now are the tables about Direct3D Feature Levels on Wikipedia. But that will require a lot of learning from me, as I am not a good web developer, so I will think about it after my vacation :)
# Vulkan Memory Allocator 3.0.0 and D3D12 Memory Allocator 2.0.0
Yesterday we released new major version of Vulkan Memory Allocator 3.0.0 and D3D12 Memory Allocator 2.0.0, so if you are coding with Vulkan or Direct3D 12, I recommend to take a look at these libraries. Because coding them is part of my job, I won't describe them in detail here, but just refer to my article published on GPUOpen.com: "Announcing Vulkan Memory Allocator 3.0.0 and Direct3D 12 Memory Allocator 2.0.0". Direct links:
Vulkan Memory Allocator
D3D12 Memory Allocator
# Untangling Direct3D 12 Memory Heap Types and Pools
Those of you who follow my blog can say that I am boring, but I can't help it - somehow GPU memory allocation became my thing, rather than shaders and effects, like most graphics programmers do. Some time ago I've written an article "Vulkan Memory Types on PC and How to Use Them" explaining what memory heaps and types are available on various types of PC GPUs, as visible through Vulkan API. This article is a Direct3D 12 equivalent, in a way.
With expressing memory types as they exist in hardware, D3D12 differs greatly from Vulkan. Vulkan defines a 2-level hierarchy of memory "heaps" and "types". A heap represents a physical piece of memory of a certain size, while a type is a "view" of a specific heap with certain properties, like cached versus uncached. This gives a great flexibility in how different GPUs can express their memory, which makes it hard for the developer to ensure he selects the optimal one on any kind of GPU. Direct3D 12 offers a fixed set of memory types. When creating a buffer or a texture, it usually means selecting one of the 3 standard "heap types":
D3D12_HEAP_TYPE_DEFAULTis intended for resources that are directly and frequently accessed by the GPU, so all render-target, depth-stencil textures, other textures, vertex and index buffers used for rendering etc. go there. It typically ends up in the memory of the graphics card.
D3D12_HEAP_TYPE_UPLOADrepresents a memory that we can
Mapand fill in from the CPU and then copy or directly read on the GPU. It must be created and stay in
D3D12_RESOURCE_STATE_GENERIC_READ. It ends up in system RAM and is uncached but write-combined, which means it should only be written, never read.
D3D12_HEAP_TYPE_READBACKrepresents the least frequently used type of memory that the GPU can write or copy to, while the CPU can
Mapand read. It must be created in
D3D12_RESOURCE_STATE_COPY_DEST. It ends up in system RAM and is cached, which means random access from our CPU code is fine for it.
So far, so good... D3D12 seems to simplify things compared to Vulkan. You can stop here and still develop a decent graphics program, but if you make a game with an open world and want to stream your content in runtime, so you need to check what memory budget is available to your app, or you want to take advantage of integrated graphics where memory is unified, you will find out that things are not that simple in this API. There are 4 different ways that D3D12 calls various memory types and they are not so obvious when we compare systems with discrete versus integrated graphics. The goal of this article is to explain and untangle all this complexity.
# Direct3D 12: Long Way to Access Data
One reason the new graphics APIs – Direct3D 12 and Vulkan – are so complicated is that there are so many levels of indirection in accessing data. Having some data prepared in a normal C++ CPU variable, we have to go a long way before we can access it in a shader. Some time ago I’ve written an article: “Vulkan: Long way to access data”. Because I have a privilege of working with both APIs in my everyday job, I thought I could also write a D3D12 equivalent, so here it is. If you are a graphics or game programmer who already know the basics of D3D12, but may feel some confusion about all the concepts defined there, this article is for you.
Here is a diagram, followed by an explanation of all its elements:
# First Look at New D3D12 Enhanced Barriers
This will be pretty advanced or at least intermediate article. It assumes you know Direct3D 12 API. Some references to Vulkan may also appear. I am writing it because I just found out that yesterday Microsoft announced an upcoming big change in D3D12: Enhanced Barriers. It will be an addition to the API that provides a new way to do barriers. Considering my professional interests, this looks very important to me and also quite revolutionary. This article summarizes my first look and my thoughts about this new addition to the API or, speaking in terms of modern internet, my "unboxing" or "reaction" ;)
Bill Kristiansen, the author of the article linked above, written that currently only the software-simulated WARP device supports the new enhanced barriers. Support in real GPU drivers will come at later time. The new barriers can replace the old way of doing them, but both will still be available and can also be mixed in one application. Which means this is not as big revolution to turn our DirectX development upside down - we can switch to them gradually. For now we can just prepare ourselves for the future by studying the interface (which I do in this article) and testing some code using WARP device.
UPDATE 2021-12-10: I just learned that Microsoft actually did publish a documentation of the new API: Enhanced Barriers @ DirectX-Specs, so I recommend to go see it before reading this article.