Tag: directx

Entries for tag "directx", ordered from most recent. Entry count: 81.

Pages: > 1 2 3 4 ... 11 >

# Secrets of Direct3D 12: Copies to the Same Buffer

Wed
04
Mar 2020

Modern graphics APIs (D3D12, Vulkan) are complicated. They are designed to squeeze maximum performance out of graphics cards. GPUs are so fast at rendering not because they work with high clock frequencies (actually they don't - frequency of 1.5 GHz is high for a GPU, as opposed to many GHz on a CPU), but because they execute their workloads in a highly parallel and pipelined way. In other words: many tasks may be executed at the same time. To make it working correctly, we must manually synchronize them using barriers. At least sometimes...

Let's consider few scenarios. Scenario 1: A draw call rendering to a texture as a Render Target View (RTV), followed by a draw call sampling from this texture as a Shader Resource View (SRV). We know we must put a D3D12_RESOURCE_BARRIER_TYPE_TRANSITION barrier in between them to transition the texture from D3D12_RESOURCE_STATE_RENDER_TARGET to D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE.

Scenario 2: Two subsequent compute shader dispatches, executed in one command list, access the same texture as an Unordered Access View (UAV). The texture stays in D3D12_RESOURCE_STATE_UNORDERED_ACCESS, but still if the second dispatch needs to wait for the first one to finish, we must issue a barrier of special type D3D12_RESOURCE_BARRIER_TYPE_UAV. That's what this type of barrier was created for.

Scenario 3: Two subsequent draw calls rendering to the same texture as a Render Target View (RTV). The texture stays in the same state D3D12_RESOURCE_STATE_RENDER_TARGET. We need not put a barrier between them. The draw calls are free to overlap in time, but GPU has its own ways to guarantee that multiple writes to the same pixel will always happen in the order of draw calls, and even more - in the order of primitives as given in index + vertex buffer!

Now to scenario 4, the most interesting one: Two subsequent copies to the same resource. Let's say we work with buffers here, just for simplicity, but I suspect textures work same way. What if the copies affect the same or overlapping regions of the destination buffer? Do they always execute in order, or can they overlap in time? Do we need to synchronize them to get proper result? What if some copies are fast, made from another buffer in GPU memory (D3D12_HEAP_TYPE_DEFAULT) and some are slow, accessing system memory (D3D12_HEAP_TYPE_UPLOAD) through PCI-Express bus? What if the card uses a compute shader to perform the copy? Isn't this the same as scenario 2?

That's a puzzle that my colleague asked recently. I didn't know the immediate answer to it, so I wrote a simple program to test this case. I prepared two buffers: gpuBuffer placed in DEFAULT heap and cpuBuffer placed in UPLOAD heap, 120 MB each, both filled with some distinct data and both transitioned to D3D12_RESOURCE_STATE_COPY_SOURCE. I then created another buffer destBuffer to be the destination of my copies. During the test I executed few CopyBufferRegion calls, from one source buffer or the other, small or large number of bytes. I then read back destBuffer and checked if the result is valid.

g_CommandList->CopyBufferRegion(destBuffer, 5 * (10 * 1024 * 1024),
    gpuBuffer, 5 * (10 * 1024 * 1024), 4 * (10 * 1024 * 1024));
g_CommandList->CopyBufferRegion(destBuffer, 3 * (10 * 1024 * 1024),
    cpuBuffer, 3 * (10 * 1024 * 1024), 4 * (10 * 1024 * 1024));
g_CommandList->CopyBufferRegion(destBuffer, SPECIAL_OFFSET,
    gpuBuffer, 102714720, 4);
g_CommandList->CopyBufferRegion(destBuffer, SPECIAL_OFFSET,
    cpuBuffer, 102714720, 4);

It turned out it is! I checked it on both AMD (Radeon RX 5700 XT) and NVIDIA card (GeForce GTX 1070). The driver serializes such copies, making sure they execute in order and the destination data is as expected even when memory regions written by the copy operations overlap.

I also made a capture using Radeon GPU Profiler (RGP) and looked at the graph. The copies are executed as a compute shader, large ones are split into multiple events, but after each copy there is an implicit barrier inserted by the driver, described as:

CmdBarrierBlitSync()
The AMD driver issued a barrier in between back-to-back blit operations to the same destination resource.

I think it explains everything. If the driver had to insert such a barrier, we can suspect it is required. I only can't find anything in the Direct3D documentation that would explicitly specify this behavior. If you find it, please let me know - e-mail me or leave a comment under this post.

Maybe we could insert a barrier manually in between these copies, just to make sure? Nope, there is no way to do it. I tried two different ways:

1. A UAV barrier like this:

D3D12_RESOURCE_BARRIER uavBarrier = {};
uavBarrier.Type = D3D12_RESOURCE_BARRIER_TYPE_UAV;
uavBarrier.UAV.pResource = destBuffer;
g_CommandList->ResourceBarrier(1, &uavBarrier);

It triggers D3D Debug Layer error that complains about the buffer not having UAV among its flags:

D3D12 ERROR: ID3D12GraphicsCommandList::ResourceBarrier: Missing resource bind flags. [ RESOURCE_MANIPULATION ERROR #523: RESOURCE_BARRIER_MISSING_BIND_FLAGS]

2. A transition barrier from COPY_DEST to COPY_DEST:

D3D12_RESOURCE_BARRIER transitionBarrier = {};
transitionBarrier.Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION;
transitionBarrier.Transition.pResource = destBuffer;
transitionBarrier.Transition.StateBefore = D3D12_RESOURCE_STATE_COPY_DEST;
transitionBarrier.Transition.StateAfter = D3D12_RESOURCE_STATE_COPY_DEST;
transitionBarrier.Transition.Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES;
g_CommandList->ResourceBarrier(1, &transitionBarrier);

Bad luck again. This time the Debug Layer complains about "before" and "after" states having to be different.

D3D12 ERROR: ID3D12CommandList::ResourceBarrier: Before and after states must be different. [ RESOURCE_MANIPULATION ERROR #525: RESOURCE_BARRIER_MATCHING_STATES]

Bonus scenario 5: ClearRenderTargetView, followed by a draw call that renders to the same texture as a Render Target View. The texture needs to be in D3D12_RESOURCE_STATE_RENDER_TARGET for both operations. We don't put a barrier in between them and don't even have a way to do it, just like in the scenario 4. So Clear operations must also guarantee the order of their execution, although I can't find anything about it in the DX12 spec.

What a mess! It seems that Direct3D 12 requires putting explicit barriers between our commands sometimes, automatically synchronizes some others, and doesn't even describe it all clearly in the documentation. The only general rule I can think of is that it cannot track resources bound through descriptors (like SRV, UAV), but tracks those that are bound in a more direct way (as render target, depth-stencil, clear target, copy destination) and synchronizes them automatically. I hope this post helped to clarify some situations that my happen in your rendering code.

Comments | #directx #rendering Share

# Two Shader Compilers of Direct3D 12

Mon
23
Dec 2019

If we write a game or other graphics application using DX12, we also need to write some shaders. We author these in high-level language called HLSL and compile them before passing to the DirectX API while creating pipeline state objects (ID3D12Device::CreateGraphicsPipelineState). There are currently two shader compilers available, both from Microsoft, each outputting different binary format:

  1. old compiler “FXC”
  2. new compiler “DXC”

Which one to choose? The new compiler, called DirectX Shader Compiler, is more modern, based on LLVM/Clang, and open source. We must use it if we want to use Shader Model 6 or above. On the other hand, shaders compiled with it require relatively recent version of Windows and graphics drivers installed, so they won’t work on systems not updated for years.

Shaders can be compiled offline using a command-line program (standalone executable compiler) and then bundled with your program in compiled binary form. That’s probably the best way to go for release version, but for development and debugging purposes it’s easier if we can change shader source just as we change the source of CPU code, easily rebuild or run, or even reload changed shader while the app is running. For this, it’s convenient to integrate shader compiler as part of your program, which is possible through a compiler API.

This gives us 4 different ways of compiling shaders. This article is a quick tutorial for all of them.

1. Old Compiler - Offline

The standalone executable of the old compiler is called “fxc.exe”. You can find it bundled with Windows SDK, which is installed together with Visual Studio. For example, in my system I located it in this path: “c:\Program Files (x86)\Windows Kits\10\bin\10.0.17763.0\x64\fxc.exe”.

To compile a shader from HLSL source to the old binary format, issue a command like this:

fxc.exe /T ps_5_0 /E main PS.hlsl /Fo PS.bin

/T is target profile
ps_5_0 means pixel shader with Shader Model 5.0
/E is the entry point - the name of the main shader function, “main” in my case
PS.hlsl is the text file with shader source
/Fo is binary output file to be written

There are many more command line parameters supported for this tool. You can display help about them by passing /? parameter. Using appropriate parameters you can change optimization level, other compilation settings, provide additional #include directories, #define macros, preview intermediate data (preprocessed source, compiled assembly), or even disassemble existing binary file.

2. Old compiler - API

To use the old compiler as a library in your C++ program:

Example:

CComPtr<ID3DBlob> code, errorMsgs;
HRESULT hr = D3DCompileFromFile(
    L"PS.hlsl", // pFileName
    nullptr, // pDefines
    nullptr, // pInclude
    "main", // pEntrypoint
    "PS_5_0", // pTarget
    0, // Flags1, can be e.g. D3DCOMPILE_DEBUG, D3DCOMPILE_SKIP_OPTIMIZATION
    0, // Flags2
    &code, // ppCode
    &errorMsgs); // ppErrorMsgs
if(FAILED(hr))
{
    if(errorMsgs)
    {
        wprintf(L"Compilation failed with errors:\n%hs\n",
            (const char*)errorMsgs->GetBufferPointer());
    }
    // Handle compilation error...
}

D3D12_GRAPHICS_PIPELINE_STATE_DESC psoDesc = {};
// (...)
psoDesc.PS.BytecodeLength = code->GetBufferSize();
psoDesc.PS.pShaderBytecode = code->GetBufferPointer();
CComPtr<ID3D12PipelineState> pso;
hr = device->CreateGraphicsPipelineState(&psoDesc, IID_PPV_ARGS(&pso));

First parameter is the path to the file that contains HLSL source. If you want to load the source in some other way, there is also a function that takes a buffer in memory: D3DCompile. Second parameter (optional) can specify preprocessor macros to be #define-d during compilation. Third parameter (optional) can point to your own implementation of ID3DInclude interface that would provide additional files requested via #include. Entry point and target platforms is a string just like in command-line compiler. Other options that have their command line parameters (e.g. /Zi, /Od) can be specified as bit flags.

Two objects returned from this function are just buffers of binary data. ID3DBlob is a simple interface that you can query for its size and pointer to its data. In case of a successful compilation, ppCode output parameter returns buffer with compiled shader binary. You should pass its data to ID3D12PipelineState creation. After successful creation, the blob can be Release-d. The second buffer ppErrorMsgs contains a null-terminated string with error messages generated during compilation. It can be useful even if the compilation succeeded, as it then contains warnings.

Update: "d3dcompiler_47.dll" file is needed. Typically some version of it is available on the machine, but generally you still want to redistribute the exact version you're using from the Win10 SDK. Otherwise you could end up compiling with an older or newer version on an end-user's machine.

3. New Compiler - Offline

Using the new compiler in its standalone form is very similar to the old one. The executable is called “dxc.exe” and it’s also bundled with Windows SDK, in the same directory. Documentation of command line syntax mentions parameters starting with "-", but old "/" also seems to work. To compile the same shader using Shader Model 6.0 issue following command, which looks almost the same as for "fxc.exe":

dxc.exe -T ps_6_0 -E main PS.hlsl -Fo PS.bin

Despite using a new binary format (called “DXIL”, based on LLVM IR), you can load it and pass it to D3D12 PSO creation the same way as before. There is a tricky issue though. You need to attach file “dxil.dll” to your program. Otherwise, the PSO creation will fail! You can find this file in Windows SDK path like: “c:\Program Files (x86)\Windows Kits\10\Redist\D3D\x64\dxil.dll”. Just copy it to the directory with target EXE of your project or the one that you use as working directory.

4. New Compiler - API

The new compiler can also be used programatically as a library, but its usage is a bit more difficult. Just as with any C++ library, start with:

This time though you need to bundle additional DLL to your program (next to “dxil.dll” mentioned above): “dxcompiler.dll”, to be found in the same “Redist\D3D\x64” directory. There is more code needed to perform the compilation. First create IDxcLibrary and IDxcCompiler objects. They can stay alive for the whole lifetime of your application or as long as you need to compile more shaders. Then for each shader, load it from a file (or any source of your choice) to a blob, call Compile method, and inspect its result, whether it’s an error + a blob with error messages, or a success + a blob with compiled shader binary.

CComPtr<IDxcLibrary> library;
HRESULT hr = DxcCreateInstance(CLSID_DxcLibrary, IID_PPV_ARGS(&library));
//if(FAILED(hr)) Handle error...

CComPtr<IDxcCompiler> compiler;
hr = DxcCreateInstance(CLSID_DxcCompiler, IID_PPV_ARGS(&compiler));
//if(FAILED(hr)) Handle error...

uint32_t codePage = CP_UTF8;
CComPtr<IDxcBlobEncoding> sourceBlob;
hr = library->CreateBlobFromFile(L"PS.hlsl", &codePage, &sourceBlob);
//if(FAILED(hr)) Handle file loading error...

CComPtr<IDxcOperationResult> result;
hr = compiler->Compile(
    sourceBlob, // pSource
    L"PS.hlsl", // pSourceName
    L"main", // pEntryPoint
    L"PS_6_0", // pTargetProfile
    NULL, 0, // pArguments, argCount
    NULL, 0, // pDefines, defineCount
    NULL, // pIncludeHandler
    &result); // ppResult
if(SUCCEEDED(hr))
    result->GetStatus(&hr);
if(FAILED(hr))
{
    if(result)
    {
        CComPtr<IDxcBlobEncoding> errorsBlob;
        hr = result->GetErrorBuffer(&errorsBlob);
        if(SUCCEEDED(hr) && errorsBlob)
        {
            wprintf(L"Compilation failed with errors:\n%hs\n",
                (const char*)errorsBlob->GetBufferPointer());
        }
    }
    // Handle compilation error...
}
CComPtr<IDxcBlob> code;
result->GetResult(&code);

D3D12_GRAPHICS_PIPELINE_STATE_DESC psoDesc = {};
// (...)
psoDesc.PS.BytecodeLength = code->GetBufferSize();
psoDesc.PS.pShaderBytecode = code->GetBufferPointer();
CComPtr<ID3D12PipelineState> pso;
hr = device->CreateGraphicsPipelineState(&psoDesc, IID_PPV_ARGS(&pso));

Compilation function also takes strings with entry point and target profile, but in Unicode format this time. The way to pass additional flags also changed. Instead of using bit flags, parameter pArguments and argCount take an array of strings that can specify additional parameters same as you would pass to the command-line compiler, e.g. L"-Zi" to attach debug information or L"-Od" to disable optimizations.

Update 2020-01-05: Thanks @MyNameIsMJP for your feedback!

Comments | #rendering #directx Share

# D3D12 Memory Allocator 1.0.0

Mon
02
Sep 2019

Since 2017 I develop Vulkan Memory Allocator - a free, MIT-licensed C++ library that helps with GPU memory management for those who develop games or other graphics applications using Vulkan. Today we released a similar library for DirectX 12: D3D12 Memory Allocator, which I was preparing for some time. Because that's a project I do at my work at AMD rather than a personal project, I won't describe it in more details here, but just point to the official resources:

If you are interested in technical details and problems I had to consider during development or you want to write your own allocator for either Vulkan or Direct3D 12, you may also check my recent article: Differences in memory management between Direct3D 12 and Vulkan.

Comments | #libraries #directx #productions Share

# Differences in memory management between Direct3D 12 and Vulkan

Fri
26
Jul 2019

Since July 2017 I develop Vulkan Memory Allocator (VMA) – a C++ library that helps with memory management in games and other applications using Vulkan. But because I deal with both Vulkan and DirectX 12 in my everyday work, I think it’s a good idea to compare them.

This is an article about a very specific topic. It may be useful to you if you are a programmer working with both graphics APIs – Direct3D 12 and Vulkan. These two APIs offer a similar set of features and performance. Both are the new generation, explicit, low-level interfaces to the modern graphics hardware (GPUs), so we could compare them back-to-back to show similarities and differences, e.g. in naming things. For example, ID3D12CommandQueue::ExecuteCommandLists function has Vulkan equivalent in form of vkQueueSubmit function. However, this article focuses on just one aspect – memory management, which means the rules and limitation of GPU memory allocation and the creation of resources – images (textures, render targets, depth-stencil surfaces etc.) and buffers (vertex buffers, index buffers, constant/uniform buffers etc.) Chapters below describe pretty much all the aspects of memory management that differ between the two APIs.

Read full article »

Comments | #vulkan #directx #gpu Share

# Programming FreeSync 2 support in Direct3D

Sat
02
Mar 2019

AMD just showed Oasis demo, presenting usage of its FreeSync 2 HDR technology. If you wonder how could you implement same features in your Windows DirectX program or game (it doesn’t matter if you use D3D11 or D3D12), here is an article for you.

But first, a disclaimer: Although I already put it on my “About” page, I’d like to stress that this is my personal blog, so all opinions presented here are my own and do not reflect that of my employer.

Radeon FreeSync (its new, official web page is here: Radeon™ FreeSync™ Technology | FreeSync™ 2 HDR Games) is an AMD technology that covers two different things, which may cause some confusion. First is variable refresh rate, second is HDR. Both of them need to be supported by a monitor. The database of FreeSync compatible monitors and their parameters is: Freesync Monitors.

Read full entry > | Comments | #gpu #directx #windows #graphics Share

# Programming HDR monitor support in Direct3D

Wed
27
Feb 2019

I got an HDR supporting monitor (LG 32GK850F), so I started learning how I can use its capabilities programatically. I still have much to learn, as there is a lot of theory to be ingested about color spaces etc., but in this blog post I’d like to go straight to the point: How to enable HDR in your C++ DirectX program? To test this, I used 3 graphics chips from 3 different PC GPU vendors. Below you can see results of my experiments.

Read full entry > | Comments | #graphics #windows #directx #gpu Share

# How to design API of a library for Vulkan?

Fri
08
Feb 2019

In my previous blog post yesterday, I shared my thoughts on graphics APIs and libraries. Another problem that brought me to these thoughts is a question: How do you design an API for a library that implements a single algorithm, pass, or graphics effect, using Vulkan or DX12? It may seem trivial at first, like a task that just needs to be designed and implemented, but if you think about it more, it turns out to be a difficult issue. They are few software libraries like this in existence. I don’t mean here a complex library/framework/engine that “horizontally” wraps the entire graphics API and takes it to a higher level, like V-EZ, Nvidia Falcor, or Google Filament. I mean just a small, “vertical”, plug-in library doing one thing, e.g. implementing ambient occlusion effect, efficient texture mipmap down-sampling, rendering UI, or simulating particle physics on the GPU. Such library needs to interact efficiently with the rest of the user’s code to be part of a large program or game. Vulkan Memory Allocator is also not a good example of this, because it only manages memory, implements no render passes, involves no shaders, and it interacts with a command buffer only in its part related to memory defragmentation.

I met this problem at my work. Later I also discussed it in details with my colleague. There are multiple questions to consider:

This is a problem similar to what we have with any C++ libraries. There is no consensus about the implementation of various basic facilities, like strings, containers, asserts, mutexes etc., so every major framework or game engine implements its own. Even something so simple as min/max function is defined is multiple places. It is defined once in <algorithm> header, but some developers don’t use STL. <Windows.h> provides its own, but these are defined as macros, so they break any other, unless you #define NOMINMAX before the include… A typical C++ nightmare. Smaller libraries are better just configurable or define their own everything, like the Vulkan Memory Allocator having its own assert, vector (can be switched to standard STL one), and 3 versions of read-write mutex.

All these issues make it easier for developers to just write a paper, describe their algorithm, possibly share a piece of code, pseudo-code or a shader, rather than provide ready to use library. This is a very bad situation. I hope that over time patterns emerge of how the API of a library implementing a single pass or effect using Vulkan/DX12 should look like. Recently my colleague shared an idea with me that if there was some higher-level API that would implement all these interactions between various parts (like resource allocation, image barriers) and we all commonly agreed on using it, then authoring libraries and stitching them together on top of it would be way easier. That’s another argument for the need of such new, higher-level graphics API.

Comments | #gpu #vulkan #directx #libraries #graphics #c++ Share

# Thoughts on graphics APIs and libraries

Thu
07
Feb 2019

Warning: This is a long rant. I’d like to share my personal thoughts and opinions on graphics APIs like Vulkan, Direct3D 12.

Some time ago I came up with a diagram showing how the graphics software technologies evolved over last decades – see my blog post “Lower-Level Graphics API - What Does It Mean?”. The new graphics APIs (Direct3D 12, Vulkan, Metal) are not only a clean start, so they abandon all the legacy garbage going back to ‘90s (like glVertex), but they also take graphics programming to a new level. It is a lower level – they are more explicit, closer to the hardware, and better match how modern GPUs work. At least that’s the idea. It means simpler, more efficient, and less error-prone drivers. But they don’t make the game or engine programming simpler. Quite the opposite – more responsibilities are now moved to engine developers (e.g. memory management/allocation). Overall, it is commonly considered a good thing though, because the engine has higher-level knowledge of its use cases (e.g. which textures are critically important and which can be unloaded when GPU memory is full), so it can get better performance by doing it properly. All this is hidden in the engines anyway, so developers making their games don’t notice the difference.

Those of you, who – just like me – deal with those low-level graphics APIs in their everyday work, may wonder if these APIs provide the right level of abstraction. I know it will sound controversial, but sometimes I get a feeling they are at the exactly worst possible level – so low they are difficult to learn and use properly, while so high they still hide some implementation details important for getting a good performance. Let’s take image/texture barriers as an example. They were non-existent in previous APIs. Now we have to do them, which is a major pain point when porting old code to a new API. Do too few of them and you get graphical corruptions on some GPUs and not on the others. Do too many and your performance can be worse than it has been on DX11 or OGL. At the same time, they are an abstract concept that still hides multiple things happening under the hood. You can never be sure which barrier will flush some caches, stall the whole graphics pipeline, or convert your texture between internal compression formats on a specific GPU, unless you use some specialized, vendor-specific profiling tool, like Radeon GPU Profiler (RGP).

It’s the same with memory. In DX11 you could just specify intended resource usage (D3D11_USAGE_IMMUTABLE, D3D11_USAGE_DYNAMIC) and the driver chose preferred place for it. In Vulkan you have to query for memory heaps available on the current GPU and explicitly choose the one you decide best for your resource, based on low-level flags like VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT etc. AMD exposes 4 memory types and 3 memory heaps. Nvidia has 11 types and 2 heaps. Intel integrated graphics exposes just 1 heap and 2 types, showing the memory is really unified, while AMD APU, also integrated, has same memory model as the discrete card. If you try to match these to what you know about physically existing video RAM and system RAM, it doesn’t make any sense. You could just pick the first DEVICE_LOCAL memory for the fastest GPU access, but even then, you cannot be sure your resource will stay in video RAM. It may be silently migrated to system RAM without your knowledge and consent (e.g. if you go out of memory), which will degrade performance. What is more, there is no way to query for the amount of free GPU memory in Vulkan, unless you do hacks like using DXGI.

Hardware queues are no better. Vulkan claims to give explicit access to the pieces of GPU hardware, so you need to query for queues that are available. For example, Intel exposes only a single graphics queue. AMD lets you create up to 3 additional compute-only queues and 2 transfer queues. Nvidia has 8 compute queues and 1 transfer queue. Do they all really map to silicon that can work in parallel? I doubt it. So how many of them to use to get the best performance? There is no way to tell by just using Vulkan API. AMD promotes doing compute work in parallel with 3D rendering while Nvidia diplomatically advises to be “conscious” with it.

It's the same with presentation modes. You have to enumerate VkPresentModeKHR-s available on the machine and choose the right one, along with number of images in the swapchain. These don't map intuitively to a typical user-facing setting of V-sync = on/off, as they are intended to be low level. Still you have no control and no way to check whether the driver does "blit" or "flip".

One could say the new APIs don’t deliver to their promise of being low level, explicit, and having predictable performance. It is impossible to deliver, unless the API is specific to one GPU, like there is on consoles. A common API over different GPUs is always high level, things happen under the hood, and there are still fast and slow paths. Isn’t all this complexity just for nothing? It may be true that comparing to previous generation APIs, drivers for the new ones need not launch additional threads in the background or perform shader compilation on first draw call, which greatly reduces chances of major hitching. (We will see how long this state will persist as the APIs and drivers evolve.) * Still there is no way to predict or ensure minimum FPS/maximum frame time. We are talking about systems where multiple processes compete for resources. On modern PCs there is even no way to know how many cycles will a single instruction take! Cache memory, branch prediction, out-of-order execution – all of these mechanisms are there in the CPU to speed up average cases, but there can always be cases when it works slowly (e.g. cache miss). It’s the same with graphics. I think we should abandon the false hope of predictable performance as a thing of the past, just like rendering graphics pixel-perfect. We can optimize for the average, but we cannot ensure the minimum. After all, games are “soft real-time systems”.

Based on that, I am thinking if there is a room for a new graphics API or top of DX12 or Vulkan. I don’t mean whole game engine with physical simulation, handling sound, input controllers and all, like Unity or UE4. I mean an API just like DX11 or OGL, on a similar or higher abstraction level (if higher level, maybe the concept of persistent “frame graph” with explicit pass and resource dependencies is the way to go?). I also don’t think it’s enough to just reimplement any of those old APIs. The new one should take advantage of features of the explicit APIs (like parallel command buffer recording), while hiding the difficult parts (e.g. queues, memory types, descriptors, barriers), so it’s easier to use and harder to misuse. (An existing library similar to this concept is V-EZ from AMD.) I think it may still have good performance. The key thing needed for creation of such library is abandoning the assumption that developer must define everything up-front, with nothing allocated, created, or transferred on first use.

See also next post: "How to design API of a library for Vulkan?"

Update 2019-02-12: I want to thank all of you for the amazing feedback I received after publishing this post, especially on Twitter. Many projects have been mentioned that try to provide an API better than Vulkan or DX12 - e.g. Apple Metal, WebGPU, The Forge by Confetti.

* Update 2019-04-16: Microsoft just announced they are adding background shader optimizations to D3D12, so driver can recompile and optimize shaders in the background on its own threads. Congratulations! We are back at D3D11 :P

Comments | #vulkan #directx #libraries #graphics #optimization #gpu Share

Pages: > 1 2 3 4 ... 11 >

[Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2020