Entries for tag "rendering", ordered from most recent. Entry count: 173.
# VkExtensionsFeaturesHelp - My New Library
I had this idea for quite some time and finally I've spent last weekend coding it, so here it is: 611 lines of code (and many times more of documentation), shared for free on MIT license:
Vulkan Extensions & Features Help, or VkExtensionsFeaturesHelp, is a small, header-only, C++ library for developers who use Vulkan API. It helps to avoid boilerplate code while creating
VkDevice object by providing a convenient way to query and then enable:
The library provides a domain-specific language to describe the list of required or supported extensions, features, and layers. The language is fully defined in terms of preprocessor macros, so no custom build step is needed.
Any feedback is welcome :)
# Vulkan Memory Types on PC and How to Use Them
Allocation of memory for buffers and textures is one of the fundamental things we do when using graphics APIs, like DirectX or Vulkan. It is of my particular interest as I develop Vulkan Memory Allocator and D3D12 Memory Allocator libraries (as part of my job – these are not personal projects). Although underlying hardware (RAM dice and GPU) stay the same, different APIs expose them differently. I’ve described these differences in detail in my article “Differences in memory management between Direct3D 12 and Vulkan”. I also gave a talk “Memory management in Vulkan and DX12” at GDC 2018 and my colleague Ste Tovey presented much more details in his talk “Memory Management in Vulkan” at Vulkanised 2018.
In this article, I would like to present common patterns seen on the list of memory types available in Vulkan on Windows PCs. First, let me recap what the API offers: Unlike in DX12, where you have just 3 default “memory heap types” (
D3D12_HEAP_TYPE_READBACK), in Vulkan there is a 2-level hierarchy, a list of “memory heaps” and “memory types” inside them you need to query and that can look completely different on various GPUs, operating systems, and driver versions. Some constraints and guarantees apply, as described in Vulkan specification, e.g. there is always some
DEVICE_LOCAL and some
HOST_VISIBLE memory type.
A memory heap, as queried from
vkGetPhysicalDeviceMemoryProperties and returned in
VkMemoryHeap, represents some (more or less) physical memory, e.g. video RAM on the graphics card or system RAM on the motherboard. It has some fixed size in bytes, and current available budget that can be queried using extension VK_EXT_memory_budget. A memory type, as returned in
VkMemoryType, belongs to certain heap and offers a “view” to that heap with certain properties, represented by
VkMemoryPropertyFlags. Most notable are:
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, which always matches flag
VK_MEMORY_HEAP_DEVICE_LOCAL_BITin the heap it belongs to, informs that the memory is local to the “device” (the GPU in Vulkan terminology). It doesn’t change what you can or cannot do with this memory type. If creating certain buffers or textures was possible only in GPU and not CPU memory, it would be expressed by appropriate bits not set in
DEVICE_LOCALflag set in a memory type is just a hint for us that resources created in that memory will probably work faster when accessed on the GPU.
VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT. Unlike previous one, this flag changes a lot. It means that you can call
VkDeviceMemoryobjects allocated from this type and get a raw, CPU-side pointer to its data. In short: you can access this memory directly from the CPU, without a need to launch a Vulkan command for explicit transfer, like
VK_MEMORY_PROPERTY_HOST_CACHED_BIT, which can occur only on memory types that are also
HOST_VISIBLE. This one again is just a hint for us. It changes nothing we can or cannot do with that memory. It just informs us that access to this memory will go through cache (from CPU perspective). As a result, a memory type with this flag should be fast to write, read, and access randomly via mapped pointer. What the lack of this flag means is not clearly defined. Such memory may represent system RAM or even video RAM, but a common meaning (at least on PC) is that accesses are then uncached but write-combined from CPU perspective, which means we should only write to it sequentially (best to do memcpy), never read from it or jump over random places, as it may be slow.
Theoretically, a good algorithm as recommended by the spec, to search for the first memory type meeting your requirements, should be robust enough to work on any GPU, but, if you want make sure your application works correctly and efficiently on a variety of graphics hardware available on the market today, you may need to adjust your resource management policy to a specific set of memory heaps/types found on a user’s machine. To simplify this task, below I present common patterns that can be observed on the list of Vulkan memory heaps and types on various GPUs, on Windows PCs. I also describe their meaning and consequences.
Before I start, I must show you website vulkan.gpuinfo.org, if you don’t already know it. It is a great database of all Vulkan capabilities, features, limits, and extensions, including memory heaps/types, cataloged from all kinds of GPUs and operating systems.
1. The Intel way
Intel manufactures integrated graphics (although they also released a discrete card recently). As GPU integrated into CPU, it shares the same memory. It then makes sense to expose following memory types in Vulkan (example: Intel(R) UHD Graphics 600):
Heap 0: DEVICE_LOCAL
Size = 1,849,059,532 B
Type 0: DEVICE_LOCAL, HOST_VISIBLE, HOST_COHERENT
Type 1: DEVICE_LOCAL, HOST_VISIBLE, HOST_COHERENT, HOST_CACHED
What it means: The simplest and the most intuitive set of memory types. There is just one memory that represents system RAM, or a part of it that can be used for graphics resources. All memory types are
DEVICE_LOCAL, which means GPU has fast access to them. They are also all
HOST_VISIBLE – accessible to the CPU. Type 0 without
HOST_CACHED flag is good for writing through mapped pointer and reading by the GPU, while type 1 with
HOST_CACHED flag is good for writing by the GPU commands and reading via mapped pointer.
How to use it: You can just load your resources directly from disk. There is no need to create a separate staging copy, separate GPU copy, and issue a transfer command, like we do with discrete graphics cards. With images you need to use
VK_IMAGE_TILING_OPTIMAL for best performance and so you need to
vkCmdCopyBufferToImage, but at least for buffers you can just map them, fill the content via CPU pointer and then tell GPU to use that memory – an approach which can save both time and precious bytes of memory.
# States and Barriers of Aliasing Render Targets
In my recent post “Initializing DX12 Textures After Allocation and Aliasing” I said that when you use Direct3D 12, you have a render-target or depth-stencil texture “B” aliasing with some other resources “A” in memory and want to use it, you need to initialize it properly every time: 1. issue a barrier of type
D3D12_RESOURCE_BARRIER_TYPE_ALIASING, 2. fully initialize its content using Clear, Discard, or Copy, so that garbage data left from previous usage of the memory as overwritten.
That is actually not the full story. There is a third thing that you need to take care of: a transition barrier for this resource. This is because functions:
ClearDepthStencilView require a texture to be in a correct state –
D3D12_RESOURCE_STATE_DEPTH_WRITE. A question arises here: How does resource state tracking interact with memory aliasing?
If you look at error and warning messages from D3D Debug Layer or PIX, you notice that states are tracked per resource, no matter if they alias in memory. This gives us some clue of how we should handle it correctly to tame the Debug Layer and make things formally correct, but it is still not that obvious where to put the barrier. Let's assume the last usage of our texture B is
D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE – the texture is used for sampling. Where should we put the barrier to transition it to
D3D12_RESOURCE_STATE_RENDER_TARGET, required to do the initializing Discard or Clear after aliasing? I can see 3 solutions here:
1. None at all – skip this barrier, just do the Discard/Clear as if the texture was in the correct RT/DS state. It obviously generates Debug Layer error, but it seems to work fine. My thinking about why this may be fine is: If the Discard/Clear requires the texture to be in the correct RT/DS state and it completely overwrites its content, then it shouldn’t care what initial state the texture is in. Its data are garbage anyway. It will most probably leave the texture initialized properly as for RT/DS state, no matter what.
2. Between the aliasing barrier and Discard/Clear. This would please the Debug Layer, but I have some doubts whether it is correct or necessary. It may not be correct because if the barrier does some transformations of the data, it would transform garbage data left after aliasing, which could be an undefined behavior leading to some persisting corruptions or even to GPU crash. It may not be necessary because we are going to re-initialize the whole content of the texture in a moment anyway with Discard/Clear.
3. After finished working with the texture in the previous frame. The idea is to leave each RT/DS texture that can alias with some other resource in a state that makes it ready for the initializing Discard/Clear next time we want to grab it from the common memory. This will work and will be formally correct, but it also sounds like a well-known and sub-optimal idea of reverting textures to some “base state” after each use. Additional problem appears when your last usage of the texture occurs on the compute queue, because there you cannot transition it to
Conclusion: While this is a tricky question that has no one simple answer, I would recommend to use  if you want to be 100% formally correct or  if you want maximum efficiency.
# CasCmdLine - Few Technical Details
As part of my job, I've written a small console program CasCmdLine with a purpose of testing AMD's FidelityFX Contrast Adaptive Sharpening (CAS) shader on an image from disk, e.g. a screenshot from your game. You can find binary and source code on github.com/GPUOpen-Effects/FidelityFX-CAS/, CasCmdLine subdirectory. See also the blog post and tutorial about it to learn about its features and the syntax of supported command line parameters.
Here I would like to point to three aspects of its implementation that allowed me to make it small and simple. They might interest you if you are a C++/Windows/graphics programmer.
1. To execute a compute shader like CAS, I needed to use a graphics API - Direct3D 11, 12, or Vulkan, as all of them are supported by the effect. I chose D3D11 as the easiest one. What’s interesting is that the API is used without creating a window or swap chain. There are no render frames, no calls to
Present, no depth-stencil texture, no message loop.
D3D11CreateDevice is used to initialize DirectX rather than
D3D11CreateDeviceAndSwapChain. The program just initializes all necessary machinery, does its job, and exits. It is perfectly possible to write a program this way, which may be a good idea for any application that needs to do some GPU-accelerated computations rather than interactive graphics like games do. I suspect this mode of operation would work even on server systems that have no monitor attached, as long as there is a GPU and graphics driver installed. See file “CasCmdLine.cpp” to find out how this is implemented.
2. There is always a question in every graphics app about how to load shaders. Surely, compiling them from HLSL/GLSL source code is the worst option, as it requires the user to have shader compiler installed or attach the compiler to your program. It also takes more time than loading shaders precompiled to the intermediate binary format. But even in this format they need to be loaded from somewhere, whether individual files or some custom compressed archive, like games tend to do. In CasCmdLine I did it differently. I attached precompiled shaders directly to the program binary. To do that, I used command line parameter
/Fh of the "fxc.exe" shader compiler, like this:
fxc.exe /T cs_5_0 /E mainCS /O3 /Fh CompiledShader.h ShaderSource.hlsl
Instead of a binary file, the compiler called with this parameter generates a text file in a format compatible with C/C++ that contains the data of the compiled shader in form of an array, like this:
Shader metadata and assembly is put here, as commented out code...
const BYTE g_mainCS =
68, 88, 66, 67, 8, 233,
11, 94, 141, 165, 83, 251,
50, 166, 219, 219, 84, 109,
128, 23, 1, 0, 0, 0,
Such file can be
#include-d in a C++ code and used to create a D3D shader directly from this data. See files "Shaders/CompiledShader_*.h" to find out how they really look like.
3. The program needs to load and save image files in JPEG, PNG, and preferably other formats. Of course, these formats are very complex, support various pixel formats, involve some compression algorithms etc., so handling them manually would require an enormous amount of work. There are libraries for this, like the official libpng and libjpeg for handling PNG/JPEG formats, respectively, or a multi-format, multi-platform library DevIL.
If the developed program is intended only for Windows, it turns out that no third-party libraries are needed. Native Windows API contains a part called Windows Imaging Component (WIC) that can load and save image files in many formats, including BMP, PNG, JPEG, TIFF, GIF, ICO, WMP, DDS. It can also do some image operations, like rescaling. It is a COM API that involves interfaces like
IWICBitmapFrameDecode, and many more. This is what I used in the program described here. I might write a tutorial about WIC someday... For now, I would just say if you figure out its API, it looks quite powerful. It might be useful for any graphics Windows app that needs to load textures. It is also what Microsoft's DirectXTex library uses under the hood.
# Bezier Curve as Easing Function
Bézier curves are named after Pierre Bézier, and primary used is geometry modeling. They are good at describing various shapes in 2D and 3D. A Bézier curve is a function x(t), y(t) - it gives points in space (x, y) for some parameter t = 0..1. But nowadays they are also used in computer graphics for animation, as easing functions. There, we need to evaluate y(x), because x is the time parameter and y is the evaluated variable.
How does the formula of a Bézier curve look like as y(x)? What constraints do the 4 control points need to meet for this function to be correct - to have only one value of y for each x, with no loops or arcs? Finally, how can this function be approximated to store it in computer memory and evaluate it efficiently in modern game engines? These sound like fundamental questions, but apparently no one researched this topic thoroughly before, so it became the subject of the Ph.D. thesis of my friend Łukasz Izdebski.
A part of his research has just been published as paper "Bézier Curve as a Generalization of the Easing Function in Computer Animation" in Advances in Computer Graphics, 37th Computer Graphics International Conference, CGI 2020, Geneva, Switzerland. We want to share an excerpt of his findings online as an article: Bezier Curve as Easing Function.
# Why Not Use Heterogeneous Multi-GPU?
There was an interesting discussion recently on one Slack channel about using integrated GPU (iGPU) together with discrete GPU (dGPU). Many sound ideas were said there, so I think it's worth writing them down. But because I probably never blogged about multi-GPU before, few words of introduction first:
The idea to use multiple GPUs in one program is not new, but not very widespread either. In old graphics APIs like Direct3D 11 it wasn't easy to implement. Doing it right in a complex game often involved engaging driver engineers from the GPU manufacturer (like AMD, NVIDIA) or using custom vendor extensions (like AMD GPU Services - see for example Explicit Crossfire API).
New generation of graphics APIs – Direct3D 12 and Vulkan – are lower level, give more direct access to the hardware. This includes the possibility to implement multi-GPU support on your own. There are two modes of operation. If the GPUs are identical (e.g. two graphics cards of the same model plugged to the motherboard), you can use them as one device object. In D3D12 you then index them as Node 0, Node 1, ... and specify
NodeMask bit mask when allocating GPU memory, submitting commands and doing all sorts of GPU things. Similarly, in Vulkan you have VK_KHR_device_group extension available that allows you to create one logical device object that will use multiple physical devices.
But this post is about heterogeneous/asymmetric multi-GPU, where there are two different GPUs installed in the system, e.g. one integrated with the CPU and one discrete. A common example is a laptop with "switchable graphics", which may have an Intel CPU with their integrated “HD” graphics plus a NVIDIA GPU. There may even be two different GPUs from the same manufacturer! My new laptop (ASUS TUF Gaming FX505DY) has AMD Radeon Vega 8 + Radeon RX 560X. Another example is a desktop PC with CPU-integrated graphics and a discrete graphics card installed. Such combination may still be used by a single app, but to do that, you must create and use two separate Device objects. But whether you could, doesn't mean you should…
First question is: Are there games that support this technique? Probably few… There is just one example I heard of: Ashes of the Singularity by Oxide Games, and it was many years ago, when DX12 was still fresh. Other than that, there are mostly tech demos, e.g. "WITCH CHAPTER 0 [cry]" by Square Enix as described on DirectX Developer Blog (also 5 years old).
iGPU typically has lower computational power than dGPU. It could accelerate some pieces of computations needed each frame. One idea is to hand over the already rendered 3D scene to the iGPU so it can finish it with screen-space postprocessing effects and present it, which sounds even better if the display is connected to iGPU. Another option is to accelerate some computations, like occlusion culling, particles, or water simulation. There are some excellent learning materials about this technique. The best one I can think of is: Multi-Adapter with Integrated and Discrete GPUs by Allen Hux (Intel), GDC 2020.
However, there are many drawbacks of this technique, which were discussed in the Slack chat I mentioned:
Conclusion: Supporting heterogeneous multi-GPU in a game engine sounds like an interesting technical challenge, but better think twice before doing it in a production code.
BTW If you just want to use just one GPU and worry about the selection of the right one, see my old post: Switchable graphics versus D3D11 adapters.
# Improving the quality of the alpha test (cutout) materials
This is a guest post from my friend Łukasz Izdebski Ph.D.
Today I want to share with you a trick which my collage from previews work mentioned to me a long time ago. It's about alpha tested (also known as cutout) materials. This technique which I want to share with you consists of two neat tricks that can improve the quality of alpha tested (cutout) materials.
Alpha test is an old technique used in computer graphics. The idea behind it is very simple. In a very basic form, a material (shader) of a rendered object can discard processed pixels based on the alpha channel of RGBA texture. When shaded pixel’s final alpha value is less than this threshold value (threshold value is constant for the instance of the material and a typical value is 50%), it is clipped (discarded) and will not land in the shaders output framebuffer. These types of materials are commonly used to render vegetation, fences, impostors/billboards, etc.
Alpha tested materials have some, I will say a little issue. It can be noticed when rendered object (with this material) is far away from the camera. Let the following video below be an example of this issue.
# Secrets of Direct3D 12: Resource Alignment
In the new graphics APIs - Direct3D 12 and Vulkan - creation of resources (textures and buffers) is a multi-step process. You need to allocate some memory and place your resource in it. In D3D12 there is a convenient function
ID3D12Device::CreateCommittedResource that lets you do it in one go, allocating the resource with its own, implicit memory heap, but it's recommended to allocate bigger memory blocks and place multiple resources in them using
When placing a resource in the memory, you need to know and respect its required size and alignment. Size is basically the number of bytes that the resource needs. Alignment is a power-of-two number which the offset of the beginning of the resource must be multiply of (
offset % alignment == 0). I'm thinking about writing a separate article for beginners explaining the concept of memory alignment, but that's a separate topic...
Back to graphics, in Vulkan you first need to create your resource (e.g.
vkCreateBuffer) and then pass it to the function (e.g.
vkGetBufferMemoryRequirements) that will return required size of alignment of this resource (
alignment). In DirectX 12 it looks similar at first glance or even simpler, as it's enough to have a structure
D3D12_RESOURCE_DESC describing the resource you will create to call
ID3D12Device::GetResourceAllocationInfo and get
D3D12_RESOURCE_ALLOCATION_INFO - a similar structure with
Alignment. I've described it briefly in my article Differences in memory management between Direct3D 12 and Vulkan.
But if you dig deeper, there is more to it. While using the mentioned function is enough to make your program working correctly, applying some additional knowledge may let you save some memory, so read on if you want to make your GPU memory allocator better. First interesting information is that alignments in D3D12, unlike in Vulkan, are really fixed constants, independent of a particular GPU or graphics driver that the user may have installed.
So, we have these constants and we also have a function to query for actual alignment. To make things even more complicated, structure
Alignment member, so you have one alignment on the input, another one on the output! Fortunately,
GetResourceAllocationInfo function allows to set
D3D12_RESOURCE_DESC::Alignment to 0, which causes default alignment for the resource to be returned.
Now, let me introduce the concept of "small textures". It turns out that some textures can be aligned 4 KB and some MSAA textures can be aligned to 64 KB. They call this "small" alignment (as opposed to "default" alignment) and there are also constants for it:
|Texture||64 KB||4 KB|
|MSAA texture||4 MB||64 KB|
Using this smaller alignment allows to save some GPU memory that would otherwise be unused as padding between resources. Unfortunately, it's unavailable for buffers and available only for small textures, with a very convoluted definition of "small". The rules are hidden in the description of Alignment member of D3D12_RESOURCE_DESC structure:
GetResourceAllocationInfo calculate all this automatically and just return optimal alignment for a resource, like Vulkan function does? Possibly, but this is not what happens. You have to ask for it explicitly. When you pass
D3D12_RESOURCE_DESC::Alignment = 0 on the input, you always get the default (larger) alignment on the output. Only when you set
D3D12_RESOURCE_DESC::Alignment to the small alignment value, this function returns the same value if the small alignment has been "granted".
There are two ways to use it in practice. First one is to calculate the eligibility of a texture to use small alignment on your own and pass it to the function only when you know the texture fulfills the conditions. Second is to try the small alignment always. When it's not granted,
GetResourceAllocationInfo returns some values other than expected (in my test it's
Alignment = 64 KB and
SizeInBytes = 0xFFFFFFFFFFFFFFFF). Then you should call it again with the default alignment. That's the method that Microsoft shows in their "Small Resources Sample". It looks good, but a problem with it is that calling this function with an alignment that is not accepted generates D3D12 Debug Layer error #721 CREATERESOURCE_INVALIDALIGNMENT. Or at least it used to, because on one of my machines the error no longer occurs. Maybe Microsoft fixed it in some recent update of Windows or Visual Studio / Windows SDK?
Here comes the last quirk of this whole D3D12 resource alignment topic: Alignment is applied to offset used in
CreatePlacedResource, which we understand as relative to the beginning of an
ID3D12Heap, but the heap itself has an alignment too!
D3D12_HEAP_DESC structure has
Alignment member. There is no equivalent of this in Vulkan. Documentation of D3D12_HEAP_DESC structure says it must be 64 KB or 4 MB. Whenever you predict you might create an MSAA textures in a heap, you must choose 4 MB. Otherwise, you can use 64 KB.
Thank you, Microsoft, for making this so complicated! ;) This article wouldn't be complete without the advertisement of open source library: D3D12 Memory Allocator (and similar Vulkan Memory Allocator), which automatically handles all this complexity. It also implements both ways of using small alignment, selectable using a preprocessor macro.