Those of you who follow my blog can say that I am boring, but I can't help it - somehow GPU memory allocation became my thing, rather than shaders and effects, like most graphics programmers do. Some time ago I've written an article "Vulkan Memory Types on PC and How to Use Them" explaining what memory heaps and types are available on various types of PC GPUs, as visible through Vulkan API. This article is a Direct3D 12 equivalent, in a way.
With expressing memory types as they exist in hardware, D3D12 differs greatly from Vulkan. Vulkan defines a 2-level hierarchy of memory "heaps" and "types". A heap represents a physical piece of memory of a certain size, while a type is a "view" of a specific heap with certain properties, like cached versus uncached. This gives a great flexibility in how different GPUs can express their memory, which makes it hard for the developer to ensure he selects the optimal one on any kind of GPU. Direct3D 12 offers a fixed set of memory types. When creating a buffer or a texture, it usually means selecting one of the 3 standard "heap types":
D3D12_HEAP_TYPE_DEFAULT is intended for resources that are directly and frequently accessed by the GPU, so all render-target, depth-stencil textures, other textures, vertex and index buffers used for rendering etc. go there. It typically ends up in the memory of the graphics card.
D3D12_HEAP_TYPE_UPLOAD represents a memory that we can
Map and fill in from the CPU and then copy or directly read on the GPU. It must be created and stay in
D3D12_RESOURCE_STATE_GENERIC_READ. It ends up in system RAM and is uncached but write-combined, which means it should only be written, never read.
D3D12_HEAP_TYPE_READBACK represents the least frequently used type of memory that the GPU can write or copy to, while the CPU can
Map and read. It must be created in
D3D12_RESOURCE_STATE_COPY_DEST. It ends up in system RAM and is cached, which means random access from our CPU code is fine for it.
So far, so good... D3D12 seems to simplify things compared to Vulkan. You can stop here and still develop a decent graphics program, but if you make a game with an open world and want to stream your content in runtime, so you need to check what memory budget is available to your app, or you want to take advantage of integrated graphics where memory is unified, you will find out that things are not that simple in this API. There are 4 different ways that D3D12 calls various memory types and they are not so obvious when we compare systems with discrete versus integrated graphics. The goal of this article is to explain and untangle all this complexity.
Discrete graphics card
According to the D3D12 API, there are two main types of platforms: NUMA and UMA, as showed by
D3D12_FEATURE_DATA_ARCHITECTURE::UMA. When this boolean member is
FALSE, we talk about Non-Uniform Memory Access (NUMA), so a discrete graphics card with separate video memory (VRAM) that is fast for the GPU to access and separate system RAM that is accessible to it only via PCI Express bus.
We already talked about
D3D12_HEAP_TYPE flags. It turns out these are only a shortcut to a more complex structure called
D3D12_HEAP_PROPERTIES. When allocating D3D12 memory, you can specify
D3D12_HEAP_TYPE_CUSTOM and fill in other members of this structure to explicitly define the memory type we want to use. Member
CPUPageProperty defines what kind of CPU access (mapping) we want to have -
MemoryPoolPreference, which is more interesting to us in this article, defines the type of memory we want to allocate, called "memory pool". In case of NUMA architectures:
D3D12_MEMORY_POOL_L0 means system memory
D3D12_MEMORY_POOL_L1 means video memory
There is a fixed mapping between the "shortcut"
D3D12_HEAP_TYPE_ values and their full
D3D12_HEAP_PROPERTIES structures, as defined on page "ID3D12Device::GetCustomHeapProperties method". When
UMA == FALSE:
CPUPageProperty = NOT_AVAILABLE, MemoryPoolPreference = L1
CPUPageProperty = WRITE_COMBINE, MemoryPoolPreference = L0
CPUPageProperty = WRITE_BACK, MemoryPoolPreference = L0
There is more to it. If you want to query for the capacity of each kind of memory, D3D12 offers another 2 ways of naming them. First, we have the function
IDXGIAdapter3::QueryVideoMemoryInfo. It allows queries for the
CurrentUsage and available
Budget of a selected memory type, grouped in a structure
DXGI_QUERY_VIDEO_MEMORY_INFO. This time, the type of memory is denoted by a different enum:
DXGI_MEMORY_SEGMENT_GROUP_LOCAL for the memory local to the GPU (video RAM) and
DXGI_MEMORY_SEGMENT_GROUP_NON_LOCAL for the memory further from the GPU (system RAM). This is the API recommended currently for queries about how much memory an application should use.
We can draw a clear line here, as shown on the picture:
D3D12_HEAP_TYPE_DEFAULT or a custom heap with
D3D12_MEMORY_POOL_L1 end up in the memory of the graphics card, so they increase the usage of
D3D12_HEAP_TYPE_READBACK, or a custom heap with
D3D12_MEMORY_POOL_L0 end up in the system memory, so they increase the usage of
There is a fourth way of addressing the memory. It is not recommended, as it matches poorly what happens under the hood, but because
IDXGIAdapter3 is a new interface, that may not be available on all versions of Windows and because you may want to know how much memory does the GPU have rather than how much budget Windows recommends using by your application, you may query for
DXGI_ADAPTER_DESC. The structure offers 3 members that need to be interpreted correctly:
DedicatedVideoMemory seems to be the size of the video RAM, so a good estimate of how much you can allocate from
DedicatedSystemMemory was 0 on all systems I tested (AMD and NVIDIA graphics card).
SharedSystemMemory seems to be the half of the size of the system RAM - possibly the amount of system memory that can be used for allocating D3D12 resources from
The second option is a graphics chip integrated with the CPU, also called UMA. Some sources call it Uniform Memory Access, others - Unified Memory Access, and Microsoft calls it Universal Memory Access. When
TRUE, a lot of things change, as can be seen in the bottom part of the picture. Now, page "ID3D12Device::GetCustomHeapProperties method" defines the "shortcut" heap types as:
CPUPageProperty = NOT_AVAILABLE, MemoryPoolPreference = L0
CPUPageProperty = WRITE_COMBINE or
WRITE_BACK (depending on
CacheCoherentUMA, which we don't discuss here),
MemoryPoolPreference = L0
CPUPageProperty = WRITE_BACK, MemoryPoolPreference = L0
As you can see, on platforms with integrated graphics, we use only
D3D12_MEMORY_POOL_L0, which represents the unified memory shared between CPU and GPU.
L1 is never used here. When querying about the budget, we also have only one type of memory used:
D3D12_MEMORY_SEGMENT_GROUP_LOCAL. Resource allocations of any type increase
CurrentUsage of this one, while
NON_LOCAL always stays 0. Note, however, that on the discrete graphics card L1 = Local and L0 = Non-Local, while on the integrated graphics L0 = Local. What a mess! To make it clear:
UMA == FALSE: Allocations in
D3D12_MEMORY_POOL_L1 increase the usage of
DXGI_MEMORY_SEGMENT_GROUP_LOCAL (GPU memory), allocations in
D3D12_MEMORY_POOL_L0 increase usage of
DXGI_MEMORY_SEGMENT_GROUP_NON_LOCAL (system memory).
UMA == TRUE: All allocations are made in
D3D12_MEMORY_POOL_L0 and they increase the usage of
DXGI_MEMORY_SEGMENT_GROUP_LOCAL, as there is only one unified memory and it's all "local" to the GPU.
If you want to inspect the
DXGI_ADAPTER_DESC structure instead of
DXGI_QUERY_VIDEO_MEMORY_INFO, you need to know that the 3 members of this structure also look differently:
DedicatedVideoMemory is set to some small value, like a few hundred MB, so you should definitely not rely on this as the maximum amount of memory you can allocate!
DedicatedSystemMemory seems to be 0 again.
SharedSystemMemory is again the half of the system RAM, which is shared by the CPU and GPU, so for the integrated graphics you should probably take
DedicatedVideoMemory + SharedSystemMemory as the maximum capacity for all your D3D12 resources.
This is all I wanted to describe in this article. To be honest, I planned to investigate this topic for years. Before I finally did it in the last few days, I used to believe that Direct3D 12 is simpler and more convenient to use when it comes to memory allocation. The
D3D12_HEAP_TYPE_ flags seem like a simplification, but after digging deeper, I conclude this API is just rigid and overly complicated for no good reason, convenient neither for game developers nor for GPU manufacturers, especially with the 4 different ways of addressing the types of memory. I consider myself a fan of Microsoft and DirectX, but now once again I must admit that this is something that Vulkan got right, just like when I "unboxed" the new D3D12 Enhanced Barriers API or described the secrets of Direct3D 12 resource alignment.
Update 2023-06-26: In DirectX 12 Agility SDK 1.710.0-preview released in March 2023, Microsoft added
D3D12_HEAP_TYPE_GPU_UPLOAD, which finally gives access to ReBAR - GPU memory that is CPU-visible, which was always available in Vulkan. See announcement on DirectX Developer Blog and specification of the new feature.