Tag: directx

Entries for tag "directx", ordered from most recent. Entry count: 83.

Pages: > 1 2 3 4 ... 11 >

# Initializing DX12 Textures After Allocation and Aliasing

Thu
19
Mar 2020

If you are a graphics programmer using Direct3D 12, you may wonder what's the initial content of a newly allocated buffer or texture. Microsoft admitted it was not clearly defined, but in practice such new memory is filled with zeros (unless you use the new flag D3D12_HEAP_FLAG_CREATE_NOT_ZEROED). See article “Coming to DirectX 12: More control over memory allocation”. This behavior has its pros and cons. Clearing all new memory makes sense, as the operating system surely doesn't want to disclose to us the data left by some other process, possibly containing passwords or other sensitive information. However, writing to a long memory region takes lots of time. Maybe that's one reason GPU memory allocation is so slow. I've seen large allocations taking even hundreds of milliseconds.

There are situations when the memory of your new buffer or texture is not zero, but may contain some random data. First case is when you create a resource using CreatePlacedResource function, inside a memory block that you might have used before for some other, already released resources. That's also what D3D12 Memory Allocator library does by default.

It is important to know that in this case you must initialize the resource in a specific way! The rules are described on page: “ID3D12Device::CreatePlacedResource method” and say: If your resource is a texture that has either RENDER_TARGET or DEPTH_STENCIL flag, you must initialize it after allocation and before any other usage using one of those methods:

  1. a clear operation (ClearRenderTargetView or ClearDepthStencilView),
  2. discard (DiscardResource)
  3. copy to the entire resource as a destination (CopyResource, CopyBufferRegion, or CopyTextureRegion).

Please note that rendering to the texture as a Render Target or writing to it as an Unordered Access View is not on the list! It means that, for example, if you implement a postprocessing effect, you allocated an intermediate 1920x1080 texture, and you want to overwrite all its pixels by rendering a fullscreen quad or triangle (better to use one triangle - see article "GCN Execution Patterns in Full Screen Passes"), then initializing the texture before your draw call seems redundant, but you still need to do it.

Correct texture initialization after allocation

What happens if you don't? Why are we asked to perform this initialization? Wouldn't we just see random colorful pixels if we use an uninitialized texture, which may or may not be a problem, depending on our use case? Not really... As I explained in my previous post “Texture Compression: What Can It Mean?”, a texture may be stored in video memory in some vendor-specific, compressed format. If the metadata of such compression are uninitialized, it might have consequences more severe than observing random colors. It's actually an undefined behavior. On one GPU everything may work fine, while on the other you may see graphical corruptions that even rendering to the texture as a Render Target cannot fix (or a total GPU crash maybe?) I've experienced this problem myself recently.

Thinking in terms of internal GPU texture compression also helps to explain why is this initialization required only for render-target and depth-stencil textures. GPUs use more aggressive compression techniques for those. Having the requirements for initialization defined like that implies that you can leave buffers and other textures uninitialized and just experience random data in their content without the danger of anything worse happening.

I feel that a side note on ID3D12GraphicsCommandList::DiscardResource function is needed, because many of you probably don't know it. Contrary to its name, this function doesn't release a resource or its memory. The meaning of this function is more like the mapping flag D3D11_MAP_WRITE_DISCARD from the old D3D11. It informs the driver that the current content of the resource might be garbage; we know about it, and we don't care, we don't need it, not going to read it, just going to fill the entire resource with a new content. Sometimes, calling this function may let the driver reach better performance. For example, it may skip downloading previous data from VRAM to the graphics chip. This is especially important and beneficial on tile-based, mobile GPUs. In some other cases, like the initialization of a newly allocated texture described here, it is required. Inside of it, driver might for example clear the metadata of its internal compression format. It is correct to call DiscardResource and then render to your new texture as a Render Target. It could also be potentially faster than doing ClearRenderTargetView instead of DiscardResource. By the way, if you happen to use Vulkan and still read that far, you might find it useful to know that the Vulkan equivalent of DiscardResource is an image memory barrier with oldLayout = VK_IMAGE_LAYOUT_UNDEFINED.

There is a second case when a resource may contain some random data. It happens when you use memory aliasing. This technique allows to save GPU memory by creating multiple resources in the same or overlapping region of a ID3D12Heap. It was not possible in old APIs (Direct3D 11, OpenGL) where each resource got its own implicit memory allocation. In Direct3D you can use CreatePlacedResource to put your resource in a specific heap, at a specific offset. It's not allowed to use aliasing resources at the same time. Sometimes you need some intermediate buffers or render targets only for a specific, short time during each frame. You can then reuse their memory for different resources needed in later part of the frame. That's the key idea of aliasing.

To do it correctly, you must do two things. First, between the usages you must issue a barrier of special type D3D12_RESOURCE_BARRIER_TYPE_ALIASING. Second, the resource to be used next (also called "ResourceAfter", as opposed to "ResourceBefore") needs to be initialized. The idea is like what I described before. You can find the rules of this initialization on page “Memory Aliasing and Data Inheritance”. This time however we are told to initialize every texture that has RENDER_TARGET or DEPTH_STENCIL flag with 1. a clear or 2. a copy operation to an entire subresource. DiscardResource is not allowed. Whether it's an omission or intentional, we have to stick to these rules, even if we feel such clears are redundant and will slow down our rendering. Otherwise we may experience hard to find bugs on some GPUs.

Correct texture initialization after aliasing

Update 2020-07-14: An engineer from Microsoft told me that the lack of DiscardResource among valid methods of initializing a texture after aliasing is probably a docs oversight and it is correct to initialize it this way, so the last picture should actually have Discard as well, just like the first one.

Update 2020-12-22: Aliasing barrier and a Clear, Discard, or Copy is not all you need to do to properly initialize a texture after aliasing. You also need to take care of its state by issuing some transition barrier. To read more, see my new post: “States and Barriers of Aliasing Render Targets”.

Comments | #directx #rendering Share

# Texture Compression: What Can It Mean?

Sun
15
Mar 2020

"Data compression - the process of encoding information using fewer bits than the original representation." That's the definition from Wikipedia. But when we talk about textures (images that we use while rendering 3D graphics), it's not that simple. There are 4 different things we can mean by talking about texture compression, some of them you may not know. In this article, I'd like to give you some basic information about them.

1. Lossless data compression. That's the compression used to shrink binary data in size losing no single bit. We may talk about compression algorithms and libraries that implement them, like popular zlib or LZMA SDK. We may also mean file formats like ZIP or 7Z, which use these algorithms, but also define a way to pack multiple files with their whole directory structure into a single archive file.

Important thing to note here is that we can use this compression for any data. Some file types like text documents or binary executables have to be compressed in a lossless way so that no bits are lost or altered. You can also compress image files this way. Compression ratio depends on the data. The size of the compressed file will be smaller if there are many repeating patterns - the data look pretty boring, like many pixels with the same color. If the data is more varying, each next pixel has even slightly different value, then you may end up with a compressed file as large as original one or even larger. For example, following two images have size 480 x 480. When saved as uncompressed BMP R8G8B8 file, they both take 691,322 bytes. When compressed to a ZIP file, the first one is only 15,993, while the second one is 552,782 bytes.

We can talk about this compression in the context of textures because assets in games are often packed into archives in some custom format which protects the data from modification, speeds up loading, and may also use compression. For example, the new Call of Duty Warzone takes 162 GB of disk space after installation, but it has only 442 files because developers packed the largest data in some archives in files Data/data/data.000, 001 etc., 1 GB each.

2. Lossy compression. These are the algorithms that allow some data loss, but offer higher compression ratios than lossless ones. We use them for specific kinds of data, usually some media - images, sound, and video. For video it's virtually essential, because raw uncompressed data would take enormous space for each second of recording. Algorithms for lossy compression use the knowledge about the structure of the data to remove the information that will be unnoticeable or degrade quality to the lowest degree possible, from the perspective of human perception. We all know them - these are formats like JPEG for images and MP3 for music.

They have their pros and cons. JPEG compresses images in 8x8 blocks using Discrete Fourier Transform (DCT). You can find awesome, in-depth explanation of it on page: Unraveling the JPEG. It's good for natural images, but with text and diagrams it may fail to maintain desired quality. My first example saved as JPEG with Quality = 20% (this is very low, I usually use 90%) takes only 24,753 B, but it looks like this:

GIF is good for such synthetic images, but fails on natural images. I saved my second example as GIF with a color palette of 32 entries. The file is only 90,686 B, but it looks like this (look closer to see dithering used due to a limited number of colors):

Lossy compression is usually accompanied by lossless compression - file formats like JPEG, GIF, MP3, MP4 etc. compress the data losslessly on top of its core algorithm, so there is no point in compressing them again.

3. GPU texture compression. Here comes the interesting part. All formats described so far are designed to optimize data storage and transfer. We need to decompress all the textures packed in ZIP files or saved as JPEG before uploading them to video memory and using for rendering. But there are other types of texture compression formats that can be used by the GPU directly. They are lossy as well, but they work in a different way - they use a fixed number of bytes per block of NxN pixels. Thanks to this, a graphics card can easily pick right block from the memory and uncompress it on the fly, e.g. while sampling the texture. Some of such formats are BC1..7 (which stands for Block Compression) or ASTC (used on mobile platforms). For example, BC7 uses 1 byte per pixel, or 16 bytes per 4x4 block. You can find some overview of these formats here: Understanding BCn Texture Compression Formats.

The only file format I know which supports this compression is DDS, as it allows to store any texture that can be loaded straight to DirectX in various pixel formats, including not only block compressed but also cube, 3D, etc. Most game developers design their own file formats for this purpose anyway, to load them straight into GPU memory with no conversion.

4. Internal GPU texture compression. Pixels of a texture may not be stored in video memory the way you think - row-major order, one pixel after the other, R8G8B8A8 or whatever format you chose. When you create a texture with D3D12_TEXTURE_LAYOUT_UNKNOWN / VK_IMAGE_TILING_OPTIMAL (always do that, except for some very special cases), the GPU is free to use some optimized internal format. This may not be true "compression" by its definition, because it must be lossless, so the memory reserved for the texture will not be smaller. It may even be larger because of the requirement to store additional metadata. (That's why you have to take care of extra VK_IMAGE_ASPECT_METADATA_BIT when working with sparse textures in Vulkan.) The goal of these formats is to speed up access to the texture.

Details of these formats are specific to GPU vendors and may or may not be public. Some ideas of how a GPU could optimize a texture in its memory include:

How to make best use of those internal GPU compression formats if they differ per graphics card vendor and we don't know their details? Just make sure you leave the driver as much optimization opportunities as possible by:

See also article Delta Color Compression Overview at GPUOpen.com.

Summary: As you can see, the term "texture compression" can mean different things, so when talking about anything like this, always make sure to be clear what do you mean unless it's obvious from the context.

Comments | #rendering #vulkan #directx Share

# Secrets of Direct3D 12: Copies to the Same Buffer

Wed
04
Mar 2020

Modern graphics APIs (D3D12, Vulkan) are complicated. They are designed to squeeze maximum performance out of graphics cards. GPUs are so fast at rendering not because they work with high clock frequencies (actually they don't - frequency of 1.5 GHz is high for a GPU, as opposed to many GHz on a CPU), but because they execute their workloads in a highly parallel and pipelined way. In other words: many tasks may be executed at the same time. To make it working correctly, we must manually synchronize them using barriers. At least sometimes...

Let's consider few scenarios. Scenario 1: A draw call rendering to a texture as a Render Target View (RTV), followed by a draw call sampling from this texture as a Shader Resource View (SRV). We know we must put a D3D12_RESOURCE_BARRIER_TYPE_TRANSITION barrier in between them to transition the texture from D3D12_RESOURCE_STATE_RENDER_TARGET to D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE.

Scenario 2: Two subsequent compute shader dispatches, executed in one command list, access the same texture as an Unordered Access View (UAV). The texture stays in D3D12_RESOURCE_STATE_UNORDERED_ACCESS, but still if the second dispatch needs to wait for the first one to finish, we must issue a barrier of special type D3D12_RESOURCE_BARRIER_TYPE_UAV. That's what this type of barrier was created for.

Scenario 3: Two subsequent draw calls rendering to the same texture as a Render Target View (RTV). The texture stays in the same state D3D12_RESOURCE_STATE_RENDER_TARGET. We need not put a barrier between them. The draw calls are free to overlap in time, but GPU has its own ways to guarantee that multiple writes to the same pixel will always happen in the order of draw calls, and even more - in the order of primitives as given in index + vertex buffer!

Now to scenario 4, the most interesting one: Two subsequent copies to the same resource. Let's say we work with buffers here, just for simplicity, but I suspect textures work same way. What if the copies affect the same or overlapping regions of the destination buffer? Do they always execute in order, or can they overlap in time? Do we need to synchronize them to get proper result? What if some copies are fast, made from another buffer in GPU memory (D3D12_HEAP_TYPE_DEFAULT) and some are slow, accessing system memory (D3D12_HEAP_TYPE_UPLOAD) through PCI-Express bus? What if the card uses a compute shader to perform the copy? Isn't this the same as scenario 2?

That's a puzzle that my colleague asked recently. I didn't know the immediate answer to it, so I wrote a simple program to test this case. I prepared two buffers: gpuBuffer placed in DEFAULT heap and cpuBuffer placed in UPLOAD heap, 120 MB each, both filled with some distinct data and both transitioned to D3D12_RESOURCE_STATE_COPY_SOURCE. I then created another buffer destBuffer to be the destination of my copies. During the test I executed few CopyBufferRegion calls, from one source buffer or the other, small or large number of bytes. I then read back destBuffer and checked if the result is valid.

g_CommandList->CopyBufferRegion(destBuffer, 5 * (10 * 1024 * 1024),
    gpuBuffer, 5 * (10 * 1024 * 1024), 4 * (10 * 1024 * 1024));
g_CommandList->CopyBufferRegion(destBuffer, 3 * (10 * 1024 * 1024),
    cpuBuffer, 3 * (10 * 1024 * 1024), 4 * (10 * 1024 * 1024));
g_CommandList->CopyBufferRegion(destBuffer, SPECIAL_OFFSET,
    gpuBuffer, 102714720, 4);
g_CommandList->CopyBufferRegion(destBuffer, SPECIAL_OFFSET,
    cpuBuffer, 102714720, 4);

It turned out it is! I checked it on both AMD (Radeon RX 5700 XT) and NVIDIA card (GeForce GTX 1070). The driver serializes such copies, making sure they execute in order and the destination data is as expected even when memory regions written by the copy operations overlap.

I also made a capture using Radeon GPU Profiler (RGP) and looked at the graph. The copies are executed as a compute shader, large ones are split into multiple events, but after each copy there is an implicit barrier inserted by the driver, described as:

CmdBarrierBlitSync()
The AMD driver issued a barrier in between back-to-back blit operations to the same destination resource.

I think it explains everything. If the driver had to insert such a barrier, we can suspect it is required. I only can't find anything in the Direct3D documentation that would explicitly specify this behavior. If you find it, please let me know - e-mail me or leave a comment under this post.

Maybe we could insert a barrier manually in between these copies, just to make sure? Nope, there is no way to do it. I tried two different ways:

1. A UAV barrier like this:

D3D12_RESOURCE_BARRIER uavBarrier = {};
uavBarrier.Type = D3D12_RESOURCE_BARRIER_TYPE_UAV;
uavBarrier.UAV.pResource = destBuffer;
g_CommandList->ResourceBarrier(1, &uavBarrier);

It triggers D3D Debug Layer error that complains about the buffer not having UAV among its flags:

D3D12 ERROR: ID3D12GraphicsCommandList::ResourceBarrier: Missing resource bind flags. [ RESOURCE_MANIPULATION ERROR #523: RESOURCE_BARRIER_MISSING_BIND_FLAGS]

2. A transition barrier from COPY_DEST to COPY_DEST:

D3D12_RESOURCE_BARRIER transitionBarrier = {};
transitionBarrier.Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION;
transitionBarrier.Transition.pResource = destBuffer;
transitionBarrier.Transition.StateBefore = D3D12_RESOURCE_STATE_COPY_DEST;
transitionBarrier.Transition.StateAfter = D3D12_RESOURCE_STATE_COPY_DEST;
transitionBarrier.Transition.Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES;
g_CommandList->ResourceBarrier(1, &transitionBarrier);

Bad luck again. This time the Debug Layer complains about "before" and "after" states having to be different.

D3D12 ERROR: ID3D12CommandList::ResourceBarrier: Before and after states must be different. [ RESOURCE_MANIPULATION ERROR #525: RESOURCE_BARRIER_MATCHING_STATES]

Bonus scenario 5: ClearRenderTargetView, followed by a draw call that renders to the same texture as a Render Target View. The texture needs to be in D3D12_RESOURCE_STATE_RENDER_TARGET for both operations. We don't put a barrier in between them and don't even have a way to do it, just like in the scenario 4. So Clear operations must also guarantee the order of their execution, although I can't find anything about it in the DX12 spec.

What a mess! It seems that Direct3D 12 requires putting explicit barriers between our commands sometimes, automatically synchronizes some others, and doesn't even describe it all clearly in the documentation. The only general rule I can think of is that it cannot track resources bound through descriptors (like SRV, UAV), but tracks those that are bound in a more direct way (as render target, depth-stencil, clear target, copy destination) and synchronizes them automatically. I hope this post helped to clarify some situations that my happen in your rendering code.

Comments | #directx #rendering Share

# Two Shader Compilers of Direct3D 12

Mon
23
Dec 2019

If we write a game or other graphics application using DX12, we also need to write some shaders. We author these in high-level language called HLSL and compile them before passing to the DirectX API while creating pipeline state objects (ID3D12Device::CreateGraphicsPipelineState). There are currently two shader compilers available, both from Microsoft, each outputting different binary format:

  1. old compiler “FXC”
  2. new compiler “DXC”

Which one to choose? The new compiler, called DirectX Shader Compiler, is more modern, based on LLVM/Clang, and open source. We must use it if we want to use Shader Model 6 or above. On the other hand, shaders compiled with it require relatively recent version of Windows and graphics drivers installed, so they won’t work on systems not updated for years.

Shaders can be compiled offline using a command-line program (standalone executable compiler) and then bundled with your program in compiled binary form. That’s probably the best way to go for release version, but for development and debugging purposes it’s easier if we can change shader source just as we change the source of CPU code, easily rebuild or run, or even reload changed shader while the app is running. For this, it’s convenient to integrate shader compiler as part of your program, which is possible through a compiler API.

This gives us 4 different ways of compiling shaders. This article is a quick tutorial for all of them.

1. Old Compiler - Offline

The standalone executable of the old compiler is called “fxc.exe”. You can find it bundled with Windows SDK, which is installed together with Visual Studio. For example, in my system I located it in this path: “c:\Program Files (x86)\Windows Kits\10\bin\10.0.17763.0\x64\fxc.exe”.

To compile a shader from HLSL source to the old binary format, issue a command like this:

fxc.exe /T ps_5_0 /E main PS.hlsl /Fo PS.bin

/T is target profile
ps_5_0 means pixel shader with Shader Model 5.0
/E is the entry point - the name of the main shader function, “main” in my case
PS.hlsl is the text file with shader source
/Fo is binary output file to be written

There are many more command line parameters supported for this tool. You can display help about them by passing /? parameter. Using appropriate parameters you can change optimization level, other compilation settings, provide additional #include directories, #define macros, preview intermediate data (preprocessed source, compiled assembly), or even disassemble existing binary file.

2. Old compiler - API

To use the old compiler as a library in your C++ program:

Example:

CComPtr<ID3DBlob> code, errorMsgs;
HRESULT hr = D3DCompileFromFile(
    L"PS.hlsl", // pFileName
    nullptr, // pDefines
    nullptr, // pInclude
    "main", // pEntrypoint
    "PS_5_0", // pTarget
    0, // Flags1, can be e.g. D3DCOMPILE_DEBUG, D3DCOMPILE_SKIP_OPTIMIZATION
    0, // Flags2
    &code, // ppCode
    &errorMsgs); // ppErrorMsgs
if(FAILED(hr))
{
    if(errorMsgs)
    {
        wprintf(L"Compilation failed with errors:\n%hs\n",
            (const char*)errorMsgs->GetBufferPointer());
    }
    // Handle compilation error...
}

D3D12_GRAPHICS_PIPELINE_STATE_DESC psoDesc = {};
// (...)
psoDesc.PS.BytecodeLength = code->GetBufferSize();
psoDesc.PS.pShaderBytecode = code->GetBufferPointer();
CComPtr<ID3D12PipelineState> pso;
hr = device->CreateGraphicsPipelineState(&psoDesc, IID_PPV_ARGS(&pso));

First parameter is the path to the file that contains HLSL source. If you want to load the source in some other way, there is also a function that takes a buffer in memory: D3DCompile. Second parameter (optional) can specify preprocessor macros to be #define-d during compilation. Third parameter (optional) can point to your own implementation of ID3DInclude interface that would provide additional files requested via #include. Entry point and target platforms is a string just like in command-line compiler. Other options that have their command line parameters (e.g. /Zi, /Od) can be specified as bit flags.

Two objects returned from this function are just buffers of binary data. ID3DBlob is a simple interface that you can query for its size and pointer to its data. In case of a successful compilation, ppCode output parameter returns buffer with compiled shader binary. You should pass its data to ID3D12PipelineState creation. After successful creation, the blob can be Release-d. The second buffer ppErrorMsgs contains a null-terminated string with error messages generated during compilation. It can be useful even if the compilation succeeded, as it then contains warnings.

Update: "d3dcompiler_47.dll" file is needed. Typically some version of it is available on the machine, but generally you still want to redistribute the exact version you're using from the Win10 SDK. Otherwise you could end up compiling with an older or newer version on an end-user's machine.

3. New Compiler - Offline

Using the new compiler in its standalone form is very similar to the old one. The executable is called “dxc.exe” and it’s also bundled with Windows SDK, in the same directory. Documentation of command line syntax mentions parameters starting with "-", but old "/" also seems to work. To compile the same shader using Shader Model 6.0 issue following command, which looks almost the same as for "fxc.exe":

dxc.exe -T ps_6_0 -E main PS.hlsl -Fo PS.bin

Despite using a new binary format (called “DXIL”, based on LLVM IR), you can load it and pass it to D3D12 PSO creation the same way as before. There is a tricky issue though. You need to attach file “dxil.dll” to your program. Otherwise, the PSO creation will fail! You can find this file in Windows SDK path like: “c:\Program Files (x86)\Windows Kits\10\Redist\D3D\x64\dxil.dll”. Just copy it to the directory with target EXE of your project or the one that you use as working directory.

4. New Compiler - API

The new compiler can also be used programatically as a library, but its usage is a bit more difficult. Just as with any C++ library, start with:

This time though you need to bundle additional DLL to your program (next to “dxil.dll” mentioned above): “dxcompiler.dll”, to be found in the same “Redist\D3D\x64” directory. There is more code needed to perform the compilation. First create IDxcLibrary and IDxcCompiler objects. They can stay alive for the whole lifetime of your application or as long as you need to compile more shaders. Then for each shader, load it from a file (or any source of your choice) to a blob, call Compile method, and inspect its result, whether it’s an error + a blob with error messages, or a success + a blob with compiled shader binary.

CComPtr<IDxcLibrary> library;
HRESULT hr = DxcCreateInstance(CLSID_DxcLibrary, IID_PPV_ARGS(&library));
//if(FAILED(hr)) Handle error...

CComPtr<IDxcCompiler> compiler;
hr = DxcCreateInstance(CLSID_DxcCompiler, IID_PPV_ARGS(&compiler));
//if(FAILED(hr)) Handle error...

uint32_t codePage = CP_UTF8;
CComPtr<IDxcBlobEncoding> sourceBlob;
hr = library->CreateBlobFromFile(L"PS.hlsl", &codePage, &sourceBlob);
//if(FAILED(hr)) Handle file loading error...

CComPtr<IDxcOperationResult> result;
hr = compiler->Compile(
    sourceBlob, // pSource
    L"PS.hlsl", // pSourceName
    L"main", // pEntryPoint
    L"PS_6_0", // pTargetProfile
    NULL, 0, // pArguments, argCount
    NULL, 0, // pDefines, defineCount
    NULL, // pIncludeHandler
    &result); // ppResult
if(SUCCEEDED(hr))
    result->GetStatus(&hr);
if(FAILED(hr))
{
    if(result)
    {
        CComPtr<IDxcBlobEncoding> errorsBlob;
        hr = result->GetErrorBuffer(&errorsBlob);
        if(SUCCEEDED(hr) && errorsBlob)
        {
            wprintf(L"Compilation failed with errors:\n%hs\n",
                (const char*)errorsBlob->GetBufferPointer());
        }
    }
    // Handle compilation error...
}
CComPtr<IDxcBlob> code;
result->GetResult(&code);

D3D12_GRAPHICS_PIPELINE_STATE_DESC psoDesc = {};
// (...)
psoDesc.PS.BytecodeLength = code->GetBufferSize();
psoDesc.PS.pShaderBytecode = code->GetBufferPointer();
CComPtr<ID3D12PipelineState> pso;
hr = device->CreateGraphicsPipelineState(&psoDesc, IID_PPV_ARGS(&pso));

Compilation function also takes strings with entry point and target profile, but in Unicode format this time. The way to pass additional flags also changed. Instead of using bit flags, parameter pArguments and argCount take an array of strings that can specify additional parameters same as you would pass to the command-line compiler, e.g. L"-Zi" to attach debug information or L"-Od" to disable optimizations.

Update 2020-01-05: Thanks @MyNameIsMJP for your feedback!

Comments | #rendering #directx Share

# D3D12 Memory Allocator 1.0.0

Mon
02
Sep 2019

Since 2017 I develop Vulkan Memory Allocator - a free, MIT-licensed C++ library that helps with GPU memory management for those who develop games or other graphics applications using Vulkan. Today we released a similar library for DirectX 12: D3D12 Memory Allocator, which I was preparing for some time. Because that's a project I do at my work at AMD rather than a personal project, I won't describe it in more details here, but just point to the official resources:

If you are interested in technical details and problems I had to consider during development or you want to write your own allocator for either Vulkan or Direct3D 12, you may also check my recent article: Differences in memory management between Direct3D 12 and Vulkan.

Comments | #libraries #directx #productions Share

# Differences in memory management between Direct3D 12 and Vulkan

Fri
26
Jul 2019

Since July 2017 I develop Vulkan Memory Allocator (VMA) – a C++ library that helps with memory management in games and other applications using Vulkan. But because I deal with both Vulkan and DirectX 12 in my everyday work, I think it’s a good idea to compare them.

This is an article about a very specific topic. It may be useful to you if you are a programmer working with both graphics APIs – Direct3D 12 and Vulkan. These two APIs offer a similar set of features and performance. Both are the new generation, explicit, low-level interfaces to the modern graphics hardware (GPUs), so we could compare them back-to-back to show similarities and differences, e.g. in naming things. For example, ID3D12CommandQueue::ExecuteCommandLists function has Vulkan equivalent in form of vkQueueSubmit function. However, this article focuses on just one aspect – memory management, which means the rules and limitation of GPU memory allocation and the creation of resources – images (textures, render targets, depth-stencil surfaces etc.) and buffers (vertex buffers, index buffers, constant/uniform buffers etc.) Chapters below describe pretty much all the aspects of memory management that differ between the two APIs.

Read full article »

Comments | #vulkan #directx #gpu Share

# Programming FreeSync 2 support in Direct3D

Sat
02
Mar 2019

AMD just showed Oasis demo, presenting usage of its FreeSync 2 HDR technology. If you wonder how could you implement same features in your Windows DirectX program or game (it doesn’t matter if you use D3D11 or D3D12), here is an article for you.

But first, a disclaimer: Although I already put it on my “About” page, I’d like to stress that this is my personal blog, so all opinions presented here are my own and do not reflect that of my employer.

Radeon FreeSync (its new, official web page is here: Radeon™ FreeSync™ Technology | FreeSync™ 2 HDR Games) is an AMD technology that covers two different things, which may cause some confusion. First is variable refresh rate, second is HDR. Both of them need to be supported by a monitor. The database of FreeSync compatible monitors and their parameters is: Freesync Monitors.

Read full entry > | Comments | #gpu #directx #windows #graphics Share

# Programming HDR monitor support in Direct3D

Wed
27
Feb 2019

I got an HDR supporting monitor (LG 32GK850F), so I started learning how I can use its capabilities programatically. I still have much to learn, as there is a lot of theory to be ingested about color spaces etc., but in this blog post I’d like to go straight to the point: How to enable HDR in your C++ DirectX program? To test this, I used 3 graphics chips from 3 different PC GPU vendors. Below you can see results of my experiments.

Read full entry > | Comments | #graphics #windows #directx #gpu Share

Pages: > 1 2 3 4 ... 11 >

[Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2021