All blog entries, ordered from most recent. Entry count: 1173.
# Secrets of Direct3D 12: Do RTV and DSV descriptors make any sense?
Sun
12
Nov 2023
This article is intended for programmers who use Direct3D 12. We will explore the topic of descriptors, especially Render Target View (RTV) and Depth Stencil View (DSV) descriptors. To understand the article, you should already know what they are and how to use them. For learning the basics, I recommend my earlier article “Direct3D 12: Long Way to Access Data” where I described resource binding model in D3D12. Current article is somewhat a follow-up to that one. I also recommend checking the official “D3D12 Resource Binding Functional Spec”.
What is a “descriptor”? My personal definition would be that generally in computing, a descriptor is a small data structure that points to some larger data and describes its parameters. While a “pointer”, “identifier”, or “key” is typically just a single number that points or identifies the main object, a “descriptor” is typically a structure that also carries some parameters describing the object.
Descriptors in D3D12 are also called “views”. They mean the same thing. Functions like ID3D12Device::CreateShaderResourceView
or CreateRenderTargetView
setup a descriptor. Note this is different from Vulkan, where a “view” and a “descriptor” are different entities. The concept of “view” is also present in relational databases. Just like in databases, a “view” points to the target data, but also specifies a way to look at them. In D3D12 it means, for example, that an SRV descriptor pointing to a texture can reinterpret its pixel format (e.g. with or without _SRGB
), limit access to only selected range of mip levels or array slices.
Let’s talk about Constant Buffer View (CBV), Shader Resource View (SRV), or Unordered Access View (UAV) descriptors first. If created inside GPU-accessible descriptor heaps (class ID3D12DescriptorHeap
, flag D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE
), they can be bound to the graphics pipeline, as I described in details in my previously mentioned article. Being part of GPU memory has some implications:
Comments | #directx #rendering Share
# Doing dynamic resolution scaling? Watch out for texture memory size!
Sun
22
Oct 2023
This article is intended for graphics programmers, mostly those who use Direct3D 12 or Vulkan and implement dynamic resolution scaling. Before we go to the main topic, some introduction first…
Nowadays, more and more games offer some kind resolution scaling. It means rendering the 3D scene in a resolution lower than the display resolution and then upscaling it using some advanced shader, often combined with temporal antialiasing and sharpening. It may be one of the solutions provided by GPU vendors (FSR from AMD, XeSS from Intel, DLSS from NVIDIA) or a custom solution (like TSR in Unreal Engine). It is an attractive option for gamers to have a good FPS increase with only minor image quality degradation. It is becoming more important as monitor resolutions increase to 4K or even more, high-end graphics cards are still expensive, and advanced rendering techniques like ray tracing encourage to favor “better pixels” over “more pixels”. See also my old article: “Scaling is everywhere, pixel-perfect is the past”.
Dynamic resolution scaling is an extension to this idea that allows rendering each frame in a different resolution, lower or higher, as a trade-off between quality and performance, to maintain desired framerate even in more complex scenes with many objects, characters, and particle effects visible on the screen. If you are interested in this technique, I strongly recommend checking a recent article from Martin Fuller from Microsoft: “Dynamic Resolution Scaling (DRS) Implementation Best Practice”, which provides many practical implementation tips.
One of the topics we need to handle when implementing dynamic resolution scaling is the creation and usage of textures that need different resolution every frame, especially render target, depth-stencil, and UAV, used temporarily between render passes. One solution could be to create these textures in the maximum resolution and use only part of them when necessary using a limited viewport. However, Martin gives multiple reasons why this option may cause some problems. A simpler and safer solution is to create a separate texture for each possible resolution, with a certain step. In modern graphics APIs (Direct3D 12 and Vulkan) they can be placed in the same memory, which we call memory aliasing.
Here comes the main question I want to answer in this article: What size of the memory heap should we use when allocating memory for these textures? Can we just take maximum dimensions of a texture (e.g. 4K resolution: 3840 x 2160), call device->GetResourceAllocationInfo()
, inspect returned D3D12_RESOURCE_ALLOCATION_INFO::SizeInBytes
and use it as D3D12_HEAP_DESC::SizeInBytes
? A texture with less pixels should always require less memory, right?
WRONG! Direct3D 12 doesn’t define such a requirement and graphics drivers from some GPU vendors really return smaller size required for a texture with larger dimensions, for some specific dimensions and pixel formats. For example, on AMD Radeon RX 7900 XTX, a render target with format DXGI_FORMAT_R16G16B16A16_FLOAT
, returns:
Why does this happen? It is because textures are not necessarily stored in the GPU memory in a way we imagine them: pixel-after-pixel, row major order. They often use some optimization techniques like pixel swizzling or compression. By “compression”, I don’t mean texture formats like BC or ASTC, which we must use explicitly. I also don’t mean compression like in ZIP file format or zlib/deflate algorithm that decrease data size. Quite the opposite: this kind of compression increases texture size by adding extra metadata, which allow to speed things up by saving memory bandwidth in certain cases. This is done mostly on render target and depth-stencil textures. For more information about it, see my old article: “Texture Compression: What Can It Mean?”. I’m talking about the meaning of the word “compression” number 4 from that article – compression formats that are internal, specific to certain graphics cards, and opaque for us – programmers who just use the graphics API. Problem is that a specific compression format for a texture is selected by the driver based on various heuristics (like render target / depth-stencil / UAV / other flags, pixel format, and… dimensions). This is why a texture with larger dimensions may unexpectedly require less memory.
To research this problem in details, I’ve written a small testing program and I performed tests on graphics cards from various vendors. It was a modification of my small Windows console app D3d12info that goes through the list of all DXGI_FORMAT
enum values, calls CheckFeatureSupport
to check which ones are supported as a render target or depth-stencil. For those that do, I called GetResourceAllocationInfo
to get memory requirements for a texture with this pixel format, with increasing dimensions, where height goes from 32 to 2160 with a step of 8, and width is calculated using a formula for 16:9 aspect ratio: width = height * 16 / 9.
Here are the results. Please remember these are just 3 specific graphics cards. The results may be different on a different GPU and even with a different version of the graphics driver.
On NVIDIA GeForce RTX 3080 with driver 545.84, I found no cases where a texture with larger dimensions requires less memory, so NVIDIA (or at least this specific card) is not affected by the problem described in this article.
On AMD Radeon RX 7900 XTX with driver 23.9.3, I found following data points where memory requirements are non-monotonic – one for each of the following formats:
DXGI_FORMAT_R16G16B16A16_FLOAT/UNORM/UINT/SNORM/SINT
: 256x144 = 458,752 B, 270x152 = 393,216 BDXGI_FORMAT_R32G32_FLOAT/UINT/SINT
: 256x144 = 458,752 B, 270x152 = 393,216 BDXGI_FORMAT_R8G8_UNORM/UINT/SNORM/SINT
: 512x288 = 458,752 B, 526x296 = 393,216 BDXGI_FORMAT_R16_FLOAT/UNORM/UINT/SNORM/SINT
: 512x288 = 458,752 B, 526x296 = 393,216 BDXGI_FORMAT_R8_UNORM/UINT/SNORM/SINT
: 256x144 = 131,072 B, 270x152 = 65,536 BDXGI_FORMAT_A8_UNORM
: 256x144 = 131,072 B, 270x152 = 65,536 BDXGI_FORMAT_B5G6R5_UNORM
: 512x288 = 458,752 B, 526x296 = 393,216 BDXGI_FORMAT_B5G5R5A1_UNORM
: 512x288 = 458,752 B, 526x296 = 393,216 BDXGI_FORMAT_B4G4R4A4_UNORM
: 512x288 = 458,752 B, 526x296 = 393,216 BOn Intel Arc A770, with driver 31.0.101.4887, almost every format used as a render target (but none of depth-stencil formats) has multiple steps where the size decreases, and it has them at larger dimensions than AMD. For example, the most “traditional” one – DXGI_FORMAT_R8G8B8A8_UNORM
returns:
What to do with this knowledge? The conclusion is that if we implement dynamic resolution scaling and we want to create textures with different dimensions aliasing in memory, required size of this memory is not necessarily the size of the largest texture in terms of dimensions. To be safe, we should query for memory requirements of all texture sizes we may want to use and calculate their maximum. In practice, it should be enough to query resolutions starting from e.g. 75% of the maximum. Because tested GPUs always have only a single step down, an even more efficient, but not fully future-proof solution could be to start from the full resolution, go down until we find a different memory size (no matter if higher or lower), and take maximum of these two.
So far, I focused only on DirectX 12. Is Vulkan also affected by this problem? In the past, it could be. Vulkan has similar concept of querying for memory requirements of a texture using function vkGetImageMemoryRequirements
. It used to have an even bigger problem. To understand it, we must recall that in D3D12, we query for memory requirements (size and alignment) given structure D3D12_RESOURCE_DESC
which describes parameters of a texture to be created. In (the initial) Vulkan API, on the other hand, we need to first create the actual VkImage
object, and then query for its memory requirements. Question is: Given two textures created with exactly same parameters (width, height, pixel format, number of mip levels, flags, etc.), do they always return the same memory requirements?
In the past, it wasn’t required by the Vulkan specification and I saw some drivers for some GPUs that really returned different sizes for two identical textures! It could cause problems, e.g. when defragmenting video memory in Vulkan Memory Allocator library. Was it a bug, or another internal optimization done by the driver, e.g. to avoid some memory bank conflicts? I don’t know. Good news is that since then, Vulkan specification was clarified to require that functions like vkGetImageMemoryRequirements
always return the same size and alignment for images created with the same parameters, and new drivers comply with that, so the problem is gone now. Vulkan 1.3 also got a new function vkGetDeviceImageMemoryRequirements
that takes VkImageCreateInfo
with image creation parameters instead of an already created image object, just like D3D12 does from the beginning.
Going back to the main question of this article: When VK_KHR_maintenance4 extension is enabled (which has been promoted to core Vulkan 1.3), the problem does not occur, as Vulkan specification says: "For a VkImage, the size memory requirement is never greater than that of another VkImage created with a greater or equal value in each of extent.width, extent.height, and extent.depth; all other creation parameters being identical.", and the same for buffers.
Big thanks to my friends: Bartek Boczula for discussions about this topic and inspiration to write this article, as well as Szymon Nowacki for testing on the Intel card! Also thanks to Constantine Shablia from Collabora for pointing me to the answer on Vulkan.
Comments | #rendering #gpu #vulkan #directx Share
# 3 Ways to Iterate Over std::vector
Sat
30
Sep 2023
This will be a short article about basics of C++. std::vector
is a container that dynamically allocates a continuous array of elements. There are multiple ways to write a for loop to iterate over its elements. In 2018 I've written an article "Efficient way of using std::vector" where I compared their performance. I concluded that using iterators can be orders of magnitude slower than using a raw pointer to its data in Debug configuration. This time, I would like to focus on how using "modern" C++ also limits our freedom.
Language purists would probably say that the recommended way to traverse a vector is now a range-based for loop, available since C++11. This is indeed the shortest and the most convenient form, but inside the loop it gives access only to the current element, not its index and not any other elements.
struct Item
{
int number;
int otherData[10];
};
std::vector<Item> items = ...
int numberSum = 0;
for(const Item& item : items)
numberSum += item.number;
Imagine that while traversing the vector, for some elements that are not the first and that meet certain criteria, we want to compare them with their previous element. This is not possible in a range-based for loop above, unless we memorize the previous element in a separate variable and update it on every iteration. Using iterators gives us the possibility to move forward or backward and thus to access the previous element when needed.
for(std::vector<Item>::const_iterator currIt = items.begin(); currIt != items.end(); ++currIt)
{
if(currIt != items.begin() && // Not the first
MeetsCriteria(*currIt))
{
std::vector<Item>::const_iterator prevIt = currIt;
--prevIt; // Step back to the previous element
CompareWithPrevious(*prevIt, *currIt);
}
}
This is more flexible, but what if we want to insert some elements to the vector while traversing it? There is a trap awaiting here because insert
method may invalidate all iterators when underlying array gets reallocated. This is why only iterating using an index is safe here:
for(size_t index = 0; index < items.size(); ++index)
{
Item newItem;
if(NeedInsertItemBefore(items[index], &newItem))
{
items.insert(items.begin() + index, newItem);
++index;
}
}
Note that pretty much any modern programming language allows to insert and remove elements from a dynamic array using an index, e.g.:
List
methods Insert
, RemoveAt
ArrayList
methods add
, remove
splice
insert
, pop
Vec
methods insert
, remove
that take index as parameter.Only C++ requires clumsy syntax with iterators like items.begin() + index
.
I know that the code fragments shown above can be written in many other ways, e.g. using auto
keyword. If you have an idea for writing any of these loops better way, please leave a comment below and let's discuss.
# ShaderCrashingAssert - a New Small Library
Sun
20
Aug 2023
Last Thursday (August 17th) AMD released a new tool for post-mortem analysis of GPU crashes: Radeon GPU Detective. I participated in this project, but because this is my personal blog and because it is weekend now, I am wearing my hobby developer hat and I want to present a small library that I developed yesterday:
ShaderCrashingAssert provides an assert-like macro for HLSL shaders that triggers a GPU memory page fault. Together with RGD, it can help with shader debugging.
Comments | #rendering #directx #productions #libraries #gpu #tools Share
# Ways to Print and Capture Text Output of a Process
Sun
02
Jul 2023
In my previous blog post “Launching process programmatically: system vs CreateProcess vs ShellExecute”, I investigated various ways of launching a new process when programming in C++ using Windows, with the focus on different ways to specify a path to the executable file. Today, I want to describe a related topic: we will investigate ways that a process can print some text messages (standard output, standard error, WriteConsole
function, DebugOutputString
function), how we can observe this output and, finally, how we can capture it programmatically when launching a subprocess using CreateProcess
function.
Visual Studio / C++ project accompanying this article: github.com/sawickiap/TextOutputTest
Comments | #windows #winapi Share
# Launching process programmatically: system vs CreateProcess vs ShellExecute
Sat
15
Apr 2023
Today I went on a quest to investigate various ways in which we can launch a process (an EXE file) programmatically, while programming in C++ using Windows. I tested 3 different functions: system
, CreateProcess
, ShellExecute
. I focused on ways to specify a path to the executable file – not on passing parameters and not on capturing standard input/output of the subprocess (which I plan to investigate next and I did). All examples below launch a subprocess and wait until it completes before continuing. They all make the subprocess inheriting the console, so if both main process and the subprocess are console programs, their output will go to the single console of the main process.
But first things first: To understand this article, please recall that in operating systems we commonly use, no matter if Windows or Linux, every executable file launched as a process has several parameters:
Paths in the file system can be absolute (in case of Windows it usually means they start with drive letter, like “C:\Dir1\Text.exe”) or relative.
Startup directory is often the same as the directory where the executable file is located, but it doesn’t need to be. Many methods of process launching offer an explicit parameter for it. We won’t use it in the code samples below, but you can also achieve this manually from system console. For example, following console command uses a relative path to launch an executable located in “C:\Dir2\Test.exe”, while current directory of the process will be the same as current directory of the console: “C:\Dir1”:
C:\Dir1>..\Dir2\Test.exe
Method 1: Function system from standard C library (required header: <stdlib.h>
or <cstdlib>
in C++) is the simplest, most primitive one. It just takes a single string as parameter. An advantage of it is that you can launch any console command with it, also built-in commands (like “echo”), not only EXE files. It is also portable between different operating systems.
#include <cstdlib>
int main()
{
char path[MAX_PATH];
strcpy_s(path, "Test.exe");
system(path);
}
Waiting for the command to finish is the default behavior of this function and so is inheriting the console, so that messages printed to the standard output by “Test.exe” will go to the same console as our host application.
path
can always be absolute or relative. For each of the 4 methods described in this article, I found answers to following questions:
strcpy_s(path, "C:\\My Program\\Test.exe");
? No. (Note the double backslash \\
is for escaping in C++, so that string will actually contain single backslashes. You can also use forward slashes /
in Windows – they work with all methods described in this article and they don’t need to be escaped in C++ code.)strcpy_s(path, "\"C:\\My Program\\Test.exe\"");
? Yes.^
, like strcpy_s(path, "C:\\My^ Program\\Test.exe");
? Yes! (However strange it looks, this is the character used as an escape sequence in Windows shell!)Method 2: Function CreateProcess from WinAPI (required header: <Windows.h>
) is likely the most native and most feature-rich option. Numerous parameters passed to the function and accompanying structures allow to control the new subprocess in various ways, including getting and using its process handle or capturing its standard input/output. Here, for simplicity, I replicate the behavior of system
function from method 1 – I make it inherit the console by passing parameter bInheritHandles = TRUE
and wait until it completes by calling WaitForSingleObject
on the process handle. Process handle and main thread handle also need to closed to avoid resource leak.
STARTUPINFO startupInfo = { sizeof(STARTUPINFO) };
PROCESS_INFORMATION processInfo = {};
BOOL success = CreateProcess(
path, // lpApplicationName
NULL, // lpCommandLine
NULL, // lpProcessAttributes
NULL, // lpThreadAttributes
TRUE, // bInheritHandles
0, // dwCreationFlags
NULL, // lpEnvironment
NULL, // lpCurrentDirectory
&startupInfo,
&processInfo);
assert(success);
WaitForSingleObject(processInfo.hProcess, INFINITE);
CloseHandle(processInfo.hThread);
CloseHandle(processInfo.hProcess);
There are actually 2 ways to pass executable file path to CreateProcess
. Code above shows the first way – using lpApplicationName
parameter, which is intended for just application name, while command line parameters are passed via next argument. Note this is different from system
function, which accepts one string with everything. Using the method shown above:
"C:\\My Program\\Test.exe"
? Yes – likely because this parameter is intended exclusively for executable file path."\"C:\\My Program\\Test.exe\""
? No.^
, like "C:\\My^ Program\\Test.exe"
? No.Method 3: Function CreateProcess, but this time passing executable file path as lpCommandLine
parameter, while leaving lpApplicationName
set to NULL
. This is also a valid use case and it behaves differently – more like launching a console command than starting a specific EXE file.
STARTUPINFO startupInfo = { sizeof(STARTUPINFO) };
PROCESS_INFORMATION processInfo = {};
BOOL success = CreateProcess(
NULL, // lpApplicationName <- !!!
path, // lpCommandLine <- !!!
NULL, // lpProcessAttributes
NULL, // lpThreadAttributes
TRUE, // bInheritHandles
0, // dwCreationFlags
NULL, // lpEnvironment
NULL, // lpCurrentDirectory
&startupInfo,
&processInfo);
assert(success);
WaitForSingleObject(processInfo.hProcess, INFINITE);
CloseHandle(processInfo.hThread);
CloseHandle(processInfo.hProcess);
"C:\\My Program\\Test.exe"
? No!"\"C:\\My Program\\Test.exe\""
? Yes.^
, like "C:\\My^ Program\\Test.exe"
? No!Method 4: Function ShellExecuteEx (or legacy ShellExecute
) which is also part of WinAPI, but coming from header <shellapi.h>
. It requires COM to be initialized with CoInitializeEx
. It can be used not only to start processes from EXE files, but also to open any types of files (TXT or DOCX documents, JPEG images etc.) with their associated programs, as if the user double-clicked on such file or right-clicked and selected one of the available “verbs”, like “Edit” or “Print”. But for this article, let’s focus on launching executable files. To replicate the same behavior as in previous methods, I pass SEE_MASK_NO_CONSOLE
to inherit console and SEE_MASK_NOCLOSEPROCESS
to retrieve process handle to be able to wait for it.
CoInitializeEx(NULL, COINIT_APARTMENTTHREADED | COINIT_DISABLE_OLE1DDE);
SHELLEXECUTEINFO shellExecuteInfo = {
.cbSize = sizeof(SHELLEXECUTEINFO),
.fMask = SEE_MASK_NOCLOSEPROCESS | SEE_MASK_NO_CONSOLE,
.lpFile = path,
.nShow = SW_SHOWNORMAL
};
BOOL success = ShellExecuteEx(&shellExecuteInfo);
assert(success);
WaitForSingleObject(shellExecuteInfo.hProcess, INFINITE);
CloseHandle(shellExecuteInfo.hProcess);
This method behaves in the following way:
"C:\\My Program\\Test.exe"
? Yes."\"C:\\My Program\\Test.exe\""
? Yes.^
, like "C:\\My^ Program\\Test.exe"
? No.To summarize, let’s see all the results in a table:
system() | CreateProcess() lpApplicationName |
CreateProcess() lpCommandLine |
ShellExecuteEx() | |
---|---|---|---|---|
Works without extension? "Test" |
Yes | No | Yes | Yes |
Searching dir of the host app? | No | No | Yes | No |
Searching current dir? | Yes | Yes | Yes | Yes |
Searching PATH env var? | Yes | No | Yes | Yes |
Path with spaces unescaped: My Program\Test.exe |
No | Yes | No | Yes |
Path with spaces enclosed with quotes: "My Program\Test.exe" |
Yes | No | Yes | Yes |
Spaces escaped with ^ : My^ Program\Test.exe |
Yes | No | No | No |
I did my tests using Windows 10, Version 22H2 (OS Build 19045.2846) and Visual Studio 2022 17.5.3. Although unlikely, it is not impossible that these results may change on another version of the operating system or C++ compiler and standard library implementation.
Comments | #windows #c++ #winapi Share
# Book review: C++ Initialization Story
Mon
27
Mar 2023
Courtesy its author BartÅ‚omiej Filipek (author of cppstories.com website), I was given an opportunity to read a book “C++ Initialization Story". Below you will find my review.
How many ways are there to initialize a variable in C++? I can think of at least the following:
int i1;
int i2; i2 = 123;
int i3 = 123;
int i4(); // function declaration not a variable
int i5(123);
int i6 = int(123);
int i7{};
int i8 = {};
int i9 = int{};
int iA{123};
int iB = {123};
int iC = int{123};
Do you know the difference between them? Which variable stays uinitialized, which is initialized with a value 0 or 123? What if I used a custom type instead of the basic int
? How many copies of the object would be created? What if that type was a class having some custom constructors? Which constructor would get called? What if it was a std::vector
or some other container?
Question like this is the foundation of this book, but topics covered by it are much wider. This book is a relatively big one. On 279 pages, the author treats the topic of "initialization" as an opportunity to describe various concepts of C++ language. Modern versions of the language standard are covered, up to C++23, but features that require new versions are explicitly marked as such. The book is not about some exotic quirks and tricks that can be done by stretching the language to its limits, but it is about concepts that are fundamental in any C++ program.
Initialization of local variables, as shown in the code above, is just the subject of the first chapter. Then initialization of "non-static data members" is described, which basically means variables inside structures and classes. Constructors obviously play the major role here, so their syntax and behavior is also described in details here. When talking about constructors, description of assignment operators and destructors follows naturally. Of course, these language constructs are described also in light of move semantics introduced by C++11. For example, did you know that std::vector<T>
on resize will be able to use move constructor of your type T
instead of performing a copy only when the move constructor is marked as noexcept
?
Another topic related to initialization is an automatic deduction of types: auto
keyword and template arguments. Special kinds of variables - static
and thread_local
are also described. The book also teaches new language constructs added for convenient variable initialization, like structured binding, designated initializers, or static inline
. If you only used the old version if C++ so far, do you know that following syntax is now possible? Do you know what it means?
auto[iter, inserted] = mySet.insert(10);
Point p {
.x = 10.0,
.y = 20.0
};
class C {
static inline int classCounter = 0;
When it comes to the difficulty level of the book, I would call it intermediate. Only some knowledge of C++ is required. Author explains every topic covered from very basics and shows simple code samples. The book additionally features a quiz in the middle and at the end, as well as a chapter with "techniques and use cases". For example, did you know that the most robust and efficient way to initialize a class with a string is to pass it by... value?
struct StringWrapper {
std::string str_;
StringWrapper(std::string str) : str_{std::move(str)} { }
For a long time I've been skeptical about new language standards like C++11, C++14, C++17, C++20. C++ is a tough language already, so every fresh addition only adds more complexity to it. It used to remind me of some elaborated, tricky Boost-style templates. But now, the more I use new features of the language (at least in my personal code), the more I like it. I always liked RAII and unique_ptr
, but now with move semantics, return value optimization, std::optional
, std::variant
, and many other additions to the language small and big, it all starts to fit together. Code is clean, concise, readable, safe (no explicit new
or delete
!), and efficient at the same time. I now think that it is not an inherent feature of C++ to be verbose (with tons of boilerplate code required) and unsafe (with memory access violation errors easy to make), it is the old-fashioned approach of treating is as "C with classes". I hope that over time more and more developers, especially those who make key decisions in software projects, will also notice that and will allow using modern C++.
The book can be bought as ebook on leanpub.com, as well as in printed version on Amazon. I can strongly recommend it - it is really good! See also my reviews of previous book by this author: "C++17 in Detail" and "C++ Lambda Story".
# Impressions from Vulkanised 2023 Conference
Thu
16
Feb 2023
Last week I attended Vulkanised conference. It is an official conference of Vulkan API. It took place 7-9 February 2023 in Munich, Germany. It was my first time at this conference. My attendance was part of my job at AMD and I co-presented with Valve about using Radeon Developer Tools on RADV (Linux AMD driver) and Steam Deck. Here, on my blog, I would like to share my personal impressions from the event.
Overall, it was well organized. There were over 200 attendees, 3 days full of talks, most of them short (20-30 minutes, some of them even 10 minutes!), happening on just one scene (apart from full-day Vulkan tutorial for beginners, happening on the first day in parallel with normal talks), with lunch break and coffee breaks in between, so everyone could see everything without a need to choose from the timetable which talks to attend. It was intense. Every evening we went for some good food and beer, which I enjoy a lot every time I visit Munich/Bavaria/Germany.
In terms of people attending, a conference like this differs completely from game developer conferences that I usually attend. On one hand, everyone there was a programmer who knows and uses Vulkan, so everyone was on the same page. On gamedev conferences, there are people from different fields, as game development is multidisciplinary - graphics and music artists, designers, programmers, business people etc. On the other hand, there were not so many people from game industry there, and if anyone, they were mostly from the world of mobile GPUs, not PC or console. It was interesting to talk with developers from various industries, using GPUs and Vulkan for different applications, like scientific computations and visualizations or even… software for cloth design for fashion business.
There were many interesting talks. I think the most valuable ones were about components of the Vulkan ecosystem that are useful to every developer, like Vulkan validation layers, VkConfigurator, Vulkan loader, or GFXReconstruct (which also added support for Direct3D 12 recently, by the way!). There were long and extensive talks teaching two recent big additions to the API: mesh shaders and Vulkan Video. Vulkan Video seems to be especially complicated, partially because it requires some knowledge of video encoding/decoding, which is something different from 3D rendering. I used to work for television, so it was not that obscure for me. But this new part of the API is also very low level. The decision to make encoding/decoding of every frame stateless, with all the state of the video stream managed by the user, makes the API surface very extensive.
Talk about Diligent Engine was interesting. I didn’t look at the project itself, but the presentation looked convincing that this is a good multi-platform 3D graphics library implemented on top of various graphics APIs. Another interesting project presentation was about VkFFT - a C library that calculates FFT on the GPU using one of many supported APIs (not only Vulkan) with state-of-the-art performance. It is implemented by assembling a string with the source code of a kernel optimized for a specific case.
Presentations about game optimization for mobile GPUs were very interesting to me. Optimizing games is what I do in my everyday job, although I work with “large” PC GPUs. I consider such talks with a collection of tips and recommendations exceptionally valuable. From these presentations, I could learn what things work fast on smartphone and tablet chips, which are different from PC and console chips. They said that on these platforms, energy consumption and bandwidth to and from memory is the most important. Because mobile GPUs are tile-based, a large amount of vertices or fat vertex format is very slow, which is not the case on PC. Also because of that, they recommend to group as many passes as possible as sub-passes of a single Vulkan render pass, even to a degree that rendering of 3D objects could be grouped together with screen-space postprocessing effects. Again, it isn’t a thing that we normally do on PCs. It was also interesting to see how they measure performance. While I always disable V-sync and just measure FPS in games, they seem to give multiple columns with results, including FPS, but also GPU utilization %, which is likely used when reaching 60 FPS with V-sync always enabled.
But more than any specific presentation, it was interesting for me to hear some general ideas about Vulkan, often repeated by multiple people. There were people from Khronos and LunarG there (the company that develops Vulkan SDK), so we could hear from and ask questions to people who really make this API. There was a discussion panel with many prominent participants who shared their voice on these topics. Noone said “what happens on Vulkanised stays on Vulkanised”, so here are some things I remember. Disclaimer: These are my personal, subjective impressions. I might remember something wrong. Please feel free to leave a comment with your own thoughts below this article.
Some profound things have been said about Vulkan. Someone said it’s not a graphics API, more like a Hardware Abstraction Layer (HAL) or an API for programming accelerators. They said it is a “design by compromise” rather than “design by committee”. They said we should think of Vulkan as not only the specification, by the entire ecosystem, including libraries, tools, code samples, learning materials, etc. I was pleased to hear that Vulkan Memory Allocator that I maintain was often mentioned as one of the examples. An open question is how many of these 3rd party components should be considered “canonical”. Many are already included in Vulkan SDK, but should official samples use them as well? Currently, they don’t, as they teach raw Vulkan. Someone also said that these ecosystem components should be properly funded. Another question was about the direction Vulkan should go. One person said it should probably become even more low-level, with app-space libraries on top of it more widely used.
It was surprising to see that there are solutions to run Vulkan above and below every other graphics API, which makes Vulkan a common ground across systems and APIs:
Among problems that developers have with using Vulkan and potential areas of development for the future, I noticed several common themes:
Overall, participation in Vulkansed conference was a great experience for me. I wish I will come back there. But Vulkan, even with its unprecedented openness, portability, and universality, is just part of the entire world of 3D graphics programming. On a conference dedicated to Vulkan I wouldn’t say loud that Direct3D 12 is more popular among PC game developers and it is not without a reason, or that maybe both these “explicit” APIs are at the worst possible level of abstraction - low level enough to be difficult to learn, to use, and easy to create bugs, while high-level enough to still hide hardware details crucial to squeezing maximum performance. But this is a separate topic…
When attending any event, I always pay attention to the quality of the audio-video system. On Vulkanised, it was very good. I especially liked the acoustics of the room, which clearly someone paid attention to when designing the interior. But there were some issues with presentation video that I don’t see too often. I blogged before about 3 Rules to Make You Image Looking Good on a Projector, where I mentioned potential problems with contrast, reproduction of colors or thin lines. Another time I described a possibility that edges of the screen may be cropped. But this conference had a different problem. Instead of connecting their laptops to a HDMI cable, speakers were asked to join an online meeting via Google Meet and share their screen there, with presentation on the big screen by another participant of that virtual call, streaming the content. We were in a Google office, after all :) This surely helped them record the presentations easily, but it also made any video or animation degraded to what looked like 2 FPS.
For more photos, see the official gallery 2023 Vulkanised by Khronos.