Entries for tag "vulkan", ordered from most recent. Entry count: 37.
# Thoughts on graphics APIs and libraries
Thu
07
Feb 2019
Warning: This is a long rant. I’d like to share my personal thoughts and opinions on graphics APIs like Vulkan, Direct3D 12.
Some time ago I came up with a diagram showing how the graphics software technologies evolved over last decades – see my blog post “Lower-Level Graphics API - What Does It Mean?”. The new graphics APIs (Direct3D 12, Vulkan, Metal) are not only a clean start, so they abandon all the legacy garbage going back to ‘90s (like glVertex
), but they also take graphics programming to a new level. It is a lower level – they are more explicit, closer to the hardware, and better match how modern GPUs work. At least that’s the idea. It means simpler, more efficient, and less error-prone drivers. But they don’t make the game or engine programming simpler. Quite the opposite – more responsibilities are now moved to engine developers (e.g. memory management/allocation). Overall, it is commonly considered a good thing though, because the engine has higher-level knowledge of its use cases (e.g. which textures are critically important and which can be unloaded when GPU memory is full), so it can get better performance by doing it properly. All this is hidden in the engines anyway, so developers making their games don’t notice the difference.
Those of you, who – just like me – deal with those low-level graphics APIs in their everyday work, may wonder if these APIs provide the right level of abstraction. I know it will sound controversial, but sometimes I get a feeling they are at the exactly worst possible level – so low they are difficult to learn and use properly, while so high they still hide some implementation details important for getting a good performance. Let’s take image/texture barriers as an example. They were non-existent in previous APIs. Now we have to do them, which is a major pain point when porting old code to a new API. Do too few of them and you get graphical corruptions on some GPUs and not on the others. Do too many and your performance can be worse than it has been on DX11 or OGL. At the same time, they are an abstract concept that still hides multiple things happening under the hood. You can never be sure which barrier will flush some caches, stall the whole graphics pipeline, or convert your texture between internal compression formats on a specific GPU, unless you use some specialized, vendor-specific profiling tool, like Radeon GPU Profiler (RGP).
It’s the same with memory. In DX11 you could just specify intended resource usage (D3D11_USAGE_IMMUTABLE
, D3D11_USAGE_DYNAMIC
) and the driver chose preferred place for it. In Vulkan you have to query for memory heaps available on the current GPU and explicitly choose the one you decide best for your resource, based on low-level flags like VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT
, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT
etc. AMD exposes 4 memory types and 3 memory heaps. Nvidia has 11 types and 2 heaps. Intel integrated graphics exposes just 1 heap and 2 types, showing the memory is really unified, while AMD APU, also integrated, has same memory model as the discrete card. If you try to match these to what you know about physically existing video RAM and system RAM, it doesn’t make any sense. You could just pick the first DEVICE_LOCAL
memory for the fastest GPU access, but even then, you cannot be sure your resource will stay in video RAM. It may be silently migrated to system RAM without your knowledge and consent (e.g. if you go out of memory), which will degrade performance. What is more, there is no way to query for the amount of free GPU memory in Vulkan, unless you do hacks like using DXGI.
Hardware queues are no better. Vulkan claims to give explicit access to the pieces of GPU hardware, so you need to query for queues that are available. For example, Intel exposes only a single graphics queue. AMD lets you create up to 3 additional compute-only queues and 2 transfer queues. Nvidia has 8 compute queues and 1 transfer queue. Do they all really map to silicon that can work in parallel? I doubt it. So how many of them to use to get the best performance? There is no way to tell by just using Vulkan API. AMD promotes doing compute work in parallel with 3D rendering while Nvidia diplomatically advises to be “conscious” with it.
It's the same with presentation modes. You have to enumerate VkPresentModeKHR
-s available on the machine and choose the right one, along with number of images in the swapchain. These don't map intuitively to a typical user-facing setting of V-sync = on/off, as they are intended to be low level. Still you have no control and no way to check whether the driver does "blit" or "flip".
One could say the new APIs don’t deliver to their promise of being low level, explicit, and having predictable performance. It is impossible to deliver, unless the API is specific to one GPU, like there is on consoles. A common API over different GPUs is always high level, things happen under the hood, and there are still fast and slow paths. Isn’t all this complexity just for nothing? It may be true that comparing to previous generation APIs, drivers for the new ones need not launch additional threads in the background or perform shader compilation on first draw call, which greatly reduces chances of major hitching. (We will see how long this state will persist as the APIs and drivers evolve.) * Still there is no way to predict or ensure minimum FPS/maximum frame time. We are talking about systems where multiple processes compete for resources. On modern PCs there is even no way to know how many cycles will a single instruction take! Cache memory, branch prediction, out-of-order execution – all of these mechanisms are there in the CPU to speed up average cases, but there can always be cases when it works slowly (e.g. cache miss). It’s the same with graphics. I think we should abandon the false hope of predictable performance as a thing of the past, just like rendering graphics pixel-perfect. We can optimize for the average, but we cannot ensure the minimum. After all, games are “soft real-time systems”.
Based on that, I am thinking if there is a room for a new graphics API or top of DX12 or Vulkan. I don’t mean whole game engine with physical simulation, handling sound, input controllers and all, like Unity or UE4. I mean an API just like DX11 or OGL, on a similar or higher abstraction level (if higher level, maybe the concept of persistent “frame graph” with explicit pass and resource dependencies is the way to go?). I also don’t think it’s enough to just reimplement any of those old APIs. The new one should take advantage of features of the explicit APIs (like parallel command buffer recording), while hiding the difficult parts (e.g. queues, memory types, descriptors, barriers), so it’s easier to use and harder to misuse. (An existing library similar to this concept is V-EZ from AMD.) I think it may still have good performance. The key thing needed for creation of such library is abandoning the assumption that developer must define everything up-front, with nothing allocated, created, or transferred on first use.
See also next post: "How to design API of a library for Vulkan?"
Update 2019-02-12: I want to thank all of you for the amazing feedback I received after publishing this post, especially on Twitter. Many projects have been mentioned that try to provide an API better than Vulkan or DX12 - e.g. Apple Metal, WebGPU, The Forge by Confetti.
* Update 2019-04-16: Microsoft just announced they are adding background shader optimizations to D3D12, so driver can recompile and optimize shaders in the background on its own threads. Congratulations! We are back at D3D11 :P
Update 2021-04-01: Same with pipeline states. In the old days, settings used to be independent, enabled using glEnable
or ID3D9Device::SetRenderState
. New APIs promised to avoid "non-orthogonal states" - having to recompile shaders on a new draw call (which caused a major hitch) by enclosing most of the states in a Pipeline (State Object). But they went too far and require a new PSO every time we want to change something simple which almost certainly doesn't go to shader code, like stencil write mask. That created new class of problems - having to create thousands of PSOs during loading (which can take minutes), necessity for shader caches, pipeline caches etc. Vulkan loosened these restrictions by offering "dynamic state" and later extended that with VK_EXT_extended_dynamic_state extension. So we are back, with just more complex API to handle :P
Comments | #gpu #optimization #graphics #directx #libraries #vulkan Share
# Vulkan sparse binding - a quick overview
Thu
13
Dec 2018
On 13 December 2018 AMD released new version of graphics driver: 18.12.2. One could say there is nothing special about it – new drivers are released as often as every 2 weeks. But this version brings a change very important for Vulkan developers. AMD now catches up with competition – Nvidia and Intel – in supporting Vulkan sparse binding and sparse residency, which makes it usable in Windows PC games. I’d like to make it an opportunity to briefly describe what is it and how can you start using it, assuming you are a programmer who already knows a bit about C or C++ and Vulkan.
Comments | #graphics #vulkan Share
# Vulkan with DXGI - experiment results
Mon
19
Nov 2018
In my previous post, I’ve described a way to get GPU memory usage in Windows Vulkan app by using DXGI. This API, designed for Direct3D, seems to work with Vulkan as well. In this post I would like to share detailed results of my experiment on two different platforms with two graphics cards from different vendors. But before that, a disclaimer:
Update 2023-12-14: This is an old article published before Vulkan received VK_EXT_memory_budget extension. With this extension, you can now query for current memory usage and budget natively in Vulkan, with no need to resort to DXGI. This article may still be valuable as long as you observe similar numbers returned in Vulkan.
Comments | #vulkan #directx #graphics #windows Share
# There is a way to query GPU memory usage in Vulkan - use DXGI
Thu
15
Nov 2018
In my GDC 2018 talk “Memory management in Vulkan and DX12” (slides freely available, video behind GDC Vault paywall) I said that in Direct3D 12 you can query for the exact amount of GPU memory used and available, while in Vulkan there is no way to do that, so I recommend to just query for memory capacity (VkMemoryHeap::size
) and limit your usage to around 80% of it. It turns out that I wasn’t quite right. If you code for Windows, there is a way to do this. I assumed that the mentioned function IDXGIAdapter3::QueryVideoMemoryInfo
is part of Direct3D 12 interface, while it is actually part of DirectX Graphics Infrastructure (DXGI). This is a more generic, higher level Windows API that allows you to enumerate adapters (graphics cards) installed in the system, query for their parameters and outputs (monitors) connected to them. Direct3D refers to this API, but it’s not the same.
Key question is: Can you use DXGI to query for GPU memory usage while doing graphics using Vulkan, not D3D11 or D3D12? Would it return some reasonable data and not all zeros? Short answer is: YES! I’ve made an experiment - I wrote a simple app that creates various Vulkan objects and queries DXGI for memory usage. Results look very promising. But before I move on to the details, here is a short primer of how to use this DXGI interface, for all non-DirectX developers:
1. Use C++ in Visual Studio. You may also use some other compiler for Windows or other programming language, but it will be probably harder to setup.
2. Install relatively new Windows SDK.
3. #include <dxgi1_4.h>
and <atlbase.h>
4. Link with “dxgi.lib”.
5. Create Factory object:
IDXGIFactory4* dxgiFactory = nullptr; CreateDXGIFactory1(IID_PPV_ARGS(&dxgiFactory));
Don’t forget to release it at the end:
dxgiFactory->Release();
6. Write a loop to enumerate available adapters. Choose and remember suitable one.
IDXGIAdapter3* dxgiAdapter = nullptr; IDXGIAdapter1* tmpDxgiAdapter = nullptr; UINT adapterIndex = 0; while(m_DxgiFactory->EnumAdapters1(adapterIndex, &tmpDxgiAdapter) != DXGI_ERROR_NOT_FOUND) { DXGI_ADAPTER_DESC1 desc; tmpDxgiAdapter>GetDesc1(&desc); if(!dxgiAdapter && desc.Flags == 0) { tmpDxgiAdapter->QueryInterface(IID_PPV_ARGS(&dxgiAdapter)); } tmpDxgiAdapter->Release(); ++adapterIndex; }
At the end, don’t forget to release it:
dxgiAdapter->Release();
Please note that using new version of DXGI interfaces like DXGIFactory4
and DXGIAdapter3
requires some relatively new version (I’m not sure which one) of both Windows SDK on developer’s side (otherwise it won’t compile) and updated Windows system on user’s side (otherwise function calls with fail with appropriate returned HRESULT
).
7. To query for GPU memory usage at the moment, use this code:
DXGI_QUERY_VIDEO_MEMORY_INFO info = {}; dxgiAdapter->QueryVideoMemoryInfo(0, DXGI_MEMORY_SEGMENT_GROUP_LOCAL, &info);
There are two possible options:
DXGI_MEMORY_SEGMENT_GROUP_LOCAL
is the memory local to the GPU, so basically video RAM.DXGI_MEMORY_SEGMENT_GROUP_NON_LOCAL
is the system RAM.Among members of the returned structure, the most interesting is CurrentUsage
. It seems to precisely reflect the use of GPU memory - it increases when I allocate a new VkDeviceMemory
object, as well as when I use some implicit memory by creating other Vulkan resources, like a swap chain, descriptor pools and descriptor sets, command pools and command buffers, query pools etc.
Other DXGI features for video memory - callback for budget change notification (IDXGIAdapter3::RegisterVideoMemoryBudgetChangeNotificationEvent
) and reservation (IDXGIAdapter3::SetVideoMemoryReservation
) may also work with Vulkan, but I didn’t check them.
As an example, on my system with GPU = AMD Radeon RX 580 8 GB and 16 GB of system RAM, on program startup and before any Vulkan or D3D initialization, DXGI reports following data:
Local:
Budget=7252479180 CurrentUsage=0
AvailableForReservation=3839547801 CurrentReservation=0
Nonlocal:
Budget=7699177267 CurrentUsage=0
AvailableForReservation=4063454668 CurrentReservation=0
8. You may want to choose correct DXGI adapter to match the physical device used in Vulkan. Even on the system with just one discrete GPU there are two adapters reported, one of them being software renderer. I exclude it by comparing desc.Flags == 0
, which means this is a real, hardware-accelerated GPU, not DXGI_ADAPTER_FLAG_REMOTE
or DXGI_ADAPTER_FLAG_SOFTWARE
.
Good news is that even when there are more such adapters in the system, there is a way to match them between DXGI and Vulkan. Both APIs return something called Locally Unique Identifier (LUID). In DXGI it’s in DXGI_ADAPTER_DESC1::AdapterLuid
. In Vulkan it’s in VkPhysicalDeviceIDProperties::deviceLUID
. They are of different types - two 32-bit numbers versus array of bytes, but it seems that simple, raw memory compare works correctly. So the way to find DXGI adapter matching Vulkan physical device is:
// After obtaining VkPhysicalDevice of your choice: VkPhysicalDeviceIDProperties physDeviceIDProps = { VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_ID_PROPERTIES }; VkPhysicalDeviceProperties2 physDeviceProps = { VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PROPERTIES_2 }; physDeviceProps.pNext = &physDeviceIDProps; vkGetPhysicalDeviceProperties2(physicalDevice, &physDeviceProps); // While enumerating DXGI adapters, replace condition: // if(!dxgiAdapter && desc.Flags == 0) // With this: if(memcmp(&desc.AdapterLuid, physDeviceIDProps.deviceLUID, VK_LUID_SIZE) == 0)
Please note that function vkGetPhysicalDeviceProperties2
requires Vulkan 1.1, so set VkApplicationInfo::apiVersion = VK_API_VERSION_1_1
. Otherwise the call results in “Access Violation” error.
In my next blog post, I present detailed results of my experiment with DXGI used with Vulkan application, tested on 2 different GPUs.
Update 2019-03-15: Khronos released Vulkan extension equivalent to this DXGI functionality: VK_EXT_memory_budget.
Comments | #directx #vulkan #graphics #windows Share
# Vulkan Memory Allocator 2.1.0
Tue
28
Aug 2018
Yesterday I merged changes in the code of Vulkan Memory Allocator that I've been working on for past few months to "master" branch, which I consider a major milestone, so I marked it as version 2.1.0-beta.1. There are many new features, including:
The release also includes many smaller bug fixes, improvements and additions. Everything is tested and documented. Yet I call it "beta" version, to encourage you to test it in your project and send me your feedback.
Comments | #vulkan #libraries #productions #graphics Share
# Porting your engine to Vulkan or DX12 - video from my talk
Fri
06
Jul 2018
Organizers of Digital Dragons conference published video recording of my talk "Porting your engine to Vulkan or DX12":
PowerPoint slides are also available for download here: Porting your engine to Vulkan or DX12 - GPUOpen.
Comments | #vulkan #directx #teaching #events #video Share
# Vulkan layers don't work? Look at registry.
Fri
06
Jul 2018
If you program using Vulkan on Windows and you see that Vulkan layers stopped working (especially after updating graphics driver or Vulkan SDK), first thing to try is to uninstall Vulkan SDK and install it again. Also restart your computer, to be sure that environmental variables are updated. But sometimes it doesn't help or it even makes things worse.
If you are still not able to successfully find and enable VK_LAYER_LUNARG_standard_validation
in your code, or you try to enable VK_LAYER_LUNARG_api_dump
using environmental variable but can't see it working, or have problem with any other layer, first thing you can try is to issue vulkaninfo
console command. If you see some errors about "JSON", it clearly indicates that there is a problem with configuration of Vulkan in your system, regarding paths to layer DLLs and their JSON descriptions.
Either way, the thing I recommend is to launch regedit
(Registry Editor) and check values in following keys:
HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\Vulkan\ExplicitLayers HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\Vulkan\ImplicitLayers
They should contain paths to valid JSON files that describe installed Vulkan layer DLLs. "ExplicitLayers" is the list of layers that can be manually enabled either programatically via VkInstanceCreateInfo::ppEnabledLayerNames
, or via environmental variable VK_INSTANCE_LAYERS
. "ImplicitLayers" is the list of layers loaded automatically by each Vulkan app (e.g. one added by Steam or RenderDoc). For further details see article: "Vulkan Validation and Debugging Layers".
It looks that the installer of Vulkan SDK can mess up these registry entries. Yesterday, after uninstalling old SDK and installing new one, I found there entries from both versions. Today, my colleague found these registry keys completely empty. So whenever you have problems with Vulkan layers on Windows, take a look at the registry.
Update 2018-08-27: There is Issue #38 on GitHub - Vulkan-Ecosystem about this.
# Human-friendly classification of Vulkan resources
Wed
06
Jun 2018
In graphics programming we deal with different kinds of resources. Their specific types and names depend on graphics API. For example, in Direct3D 9 we have vertex buffers, index buffers, constant buffers, textures etc. OpenGL equivalent of constant buffer is uniform buffer object (UBO).
Vulkan has only two types of resources: buffers and images. This may be the only thing that is simpler in Vulkan than in other APIs :) When creating such resource, we specify usage flags that define how do we intend to use it. For example, VK_BUFFER_USAGE_VERTEX_BUFFER_BIT
means that a buffer may be used as vertex buffer. VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT
means that an image may be used as color attachment (which is Vulkan name for “render target”).
Such flags may be combined together, so a single buffer can contain data to be used as vertex buffer, index buffer, and uniform buffer. I’m not 100% sure if this is guaranteed by the specification (theoretically some drivers could return disjoint sets of VkMemoryRequirements::memoryTypeBits
for different usage flags), but I think that every real implementation allows that. It means we cannot clearly classify buffers and images into categories. Despite that, I decided to develop a human-friendly classification of Vulkan resources into several categories, starting from most “special”, and ending with most “common/generic” ones. I propose following algorithm:
For buffers: // Buffer is used as source of data for fixed-function stage of graphics pipeline. // It’s indirect, vertex, or index buffer. if ((usage & (VK_BUFFER_USAGE_INDIRECT_BUFFER_BIT | VK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_INDEX_BUFFER_BIT)) != 0) return class0; // Buffer is accessed by shaders for load/store/atomic. // Aka “UAV” else if ((usage & (VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_STORAGE_TEXEL_BUFFER_BIT)) != 0) return class1; // Buffer is accessed by shaders for reading uniform data. // Aka “constant buffer” else if ((usage & (VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT | VK_BUFFER_USAGE_UNIFORM_TEXEL_BUFFER_BIT)) != 0) return class2; // Any other type of buffer. // Notice that VK_BUFFER_USAGE_TRANSFER_SRC_BIT and VK_BUFFER_USAGE_TRANSFER_DST_BIT // flags are intentionally ignored. else return class3; For images: // Image is used as depth/stencil “texture/surface”. if ((usage & VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT) != 0) return class0; // Image is used as other type of attachment. // Aka “render target” or “UAV” else if ((usage & (VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT | VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT | VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT | VK_IMAGE_USAGE_STORAGE_BIT)) != 0) return class1; // Image is accessed by shaders for sampling. // Aka “texture” else if ((usage & VK_IMAGE_USAGE_SAMPLED_BIT) != 0) return class2; // Any other type of image. // Notice that VK_IMAGE_USAGE_TRANSFER_SRC_BIT and VK_IMAGE_USAGE_TRANSFER_DST_BIT // flags are intentionally ignored. else return class3;
I needed this because I wanted to introduce better coloring to VMA Dump Vis. Vulkan Memory Allocator (VMA) is a C++ library that simplifies GPU memory management in Vulkan applications. VMA Dump Vis is a Python script that can visualize JSON dump from this library on a picture. As I updated the library to remember usage flags of created resources, I wanted to use them to show more information on the picture. To do this, I defined following color scheme:
Example visualization of Vulkan memory in some game:
This color scheme is carefully designed. I based it on following principles:
VK_IMAGE_TILING_OPTIMAL
should be used wherever possible, so those with VK_IMAGE_TILING_LINEAR
are just marked with green. Images of unknown tiling have color between cyan and green.You can already visualize your Vulkan memory with all these colors if you grab Vulkan Memory Allocator from development branch. I think that this classification of GPU resources and accompanying color scheme could also be useful for other graphics APIs.