All Blog Entries

All blog entries, ordered from most recent. Entry count: 1155.

Pages: 1 2 3 ... 145 >

# Hello World Under the Microscope - New Article Published

Oct 2022

A Python program that prints "Hello World" on the console - what can be simpler than this? The entire program is:

print("Hello World")

Yet, together with my friends, we wrote a long article about it! When the topic is described by two security researchers skilled in reverse engineering and knowledgeable about the internals of Python interpreter, Windows operating system, and its console, together with a graphics programmer that knows how graphics and text get displayed on the screen, from Direct3D API down to the internals of a graphics card and pixels on the screen, the result is an in-depth description of the long journey this simple command makes in a computer.

The article was originally published in Polish in issue 100 (1/2022) of the Programista magazine in February 2022. Now, we prepared an English translation, and we are allowed to publish it for free on the Internet, so here it is: Hello World under the microscope. You can also download the original Polish version as PDF file or order printed version of the magazine.

Comments | #rendering #productions Share

# DivideRoundingUp Function and the Value of Abstraction

Sep 2022

It will be a brief article. Imagine we implement a postprocessing effect that needs to recalculate all the pixels on the screen, reading one input texture as SRV 0 and writing one output texture as UAV 0. We use a compute shader that has numthreads(8, 8, 1) declared, so each thread group processes 8 x 8 pixels. When looking at various codebases in gamedev, I've seen many times a code similar to this one:

renderer->SetConstantBuffer(0, &computeConstants);
renderer->BindTexture(0, &inputTexture);
renderer->BindUAV(0, &outputTexture);
constexpr uint32_t groupSize = 8;
    (screenWidth  + groupSize - 1) / groupSize,
    (screenHeight + groupSize - 1) / groupSize,

It should work fine. We definitely need to align up the number of groups to dispatch, so we don't skip pixels on the right and the bottom edge in case our screen resolution is not a multiply of 8. The reason I don't like this code is that it uses a "trick" that in my opinion should be encapsulated like this:

uint32_t DivideRoundingUp(uint32_t a, uint32_t b)
    return (a + b - 1) / b;
    DivideRoundingUp(screenWidth, groupSize),
    DivideRoundingUp(screenHeight, groupSize),

Abstraction is the fundamental concept of software engineering. It is hard to work with complex systems without hiding details behind some higher-level abstraction. This idea applies to entire software modules, but also to small, 1-line pieces of code like this one above. For some programmers it might be obvious when seeing (a + b - 1) / b that we do a division with rounding up, but a junior programmer who just joined the team may not know this trick. By moving it out of view and giving it a descriptive name, we make the code cleaner and easier to understand for everyone. Therefore I think that all small arithmetic or bit tricks like this should be enclosed in a library of functions rather than used inline. Same with the popular formula for checking if a number is a power of two:

bool IsPow2(uint32_t x)
    return (x & (x - 1)) == 0;

Comments | #software engineering #algorithms #c++ Share

# D3d12info - Printing D3D12 GPU Information to Console

Jul 2022

My next little hobby project is D3d12info. It is a Windows console program that prints all the information it can get about the current GPU installed in the system, as seen through Direct3D 12 API. It also fetches additional information through AMD GPU Services (on AMD cards), NVAPI (on NVIDIA cards), Vulkan, and WinAPI, mostly to identify the current version of the graphics driver and Windows system. I will try to keep it updated to the latest Agility SDK, to query it for support for the latest hardware features of the graphics card.

I share it under open-source MIT license. You can see full source code in the GitHub repository and download compiled binary from the Releases tab.

The tool can be compared to DirectX Caps Viewer you can find in your Windows SDK installation under path "c:\Program Files (x86)\Windows Kits\10\bin\*\x64\dxcapsviewer.exe" in terms of the information extracted from DX12. However, instead of GUI, it provides a command-line interface, which makes it similar to the "vulkaninfo" tool. Information is printed in a human-readable text format by default, but JSON format can be selected by providing -j parameter, making it suitable for automated processing. Additional command-line parameters are supported, including a choice of the GPU if there are many installed in the system. Launch it with parameter -h to see the command-line syntax.

In the future, I would like to extend it with a web back-end that would gather a database of various GPUs and driver versions, like Vulkan Hardware Database does for Vulkan, and to make it browsable online. As far as I know, there is no such database for D3D12 at the moment. Best we have right now are the tables about Direct3D Feature Levels on Wikipedia. But that will require a lot of learning from me, as I am not a good web developer, so I will think about it after my vacation :)

Comments | #productions #tools #directx #gpu Share

# SimplySaveAs - a Small Tool for Perforce Users

Jun 2022

In my old article "Tips for Using Perforce" I promised to dedicate a separate article to what I described there in point 10, so here it is. First, let's talk about the problem. When using a version control system, e.g. Git or Perforce, you surely sometimes inspect the history of a file, to see who changed it, when, and what exactly has been changed throughout its previous versions. GUI clients of such systems offer convenient views to compare text files, but sometimes you may just need to save an old version of the file on your disk - not to update it in your main working copy, but to export it to a separate folder.

In some applications, this is easy. For example, Git Extensions, my favorite GUI client for Git, offers File History window that shows revision history of a selected file. In this window, we can right-click on a specific revision from the list and click "Save as" to export that specific version of the file to a new location on disk.

Unfortunately, in Perforce there is no such command. There is History tab that shows the list of revisions of a selected file. It also offers context menu under right mouse button to do something with a selected revision, but among the commands to diff etc. there is no "Save As", only "Open With". This one allows us to choose some application and open the file with it, which might be useful in case of text files or some other documents (e.g. DOCX, PDF) that we just want to preview using their dedicated app. But what if it is a binary file, having some non-standard extension, that we just want to export to disk?

Here is where the little tool I developed might be useful. SimplySaveAs is a Windows program that you can use to "Open With" a file in Perforce. All it does is show a "Save As" window that lets you choose a place and name where the file should be saved on your disk. This way, the external tool provides the command missing in Perforce visual client (P4V).

The program doesn't need any installation. The repository linked above also contains full source code in C++, but all you need to download is just the file "SimplySaveAs.exe". You can put it in any location on your disk. I like to have a separate directory "C:\PortablePrograms\", where I put all the portable applications that don't need installation, like this one.

First time you want to use it, you need to click on Open With > Choose Application... in Perforce and select "SimplySaveAs.exe" from your disk.

On every next use, Perforce will remember this program and show it available in the context menu, so you can just click Open With > SimplySaveAs.

How does it work? As you may know, opening a file with a program actually needs to save the file on a disk somewhere, likely in a temporary folder, and then launching the program with a path to this file passed as a command-line parameter. This is also what Perforce does when we use "Open With" command. So all my program does is ask the user for a target path and then copy the file from the source, temporary location read from the parameter to the target location selected by the user.

Comments | #tools #productions Share

# An Idea for Visualization of Frame Times

May 2022

In real-time graphics applications like games, we usually measure performance as the average number of frames per second (FPS). Showing this average is a good estimate of how well the application performs, how heavy is the per-frame workload, how fast is the system where it executes, and, most importantly, whether the performance suffices for showing a smooth, good looking animation, as opposed to a "slideshow". But this is not a complete story. If some frames take an exceptionally long time, then even if others are very short, an unpleasant hitching may be visible to the player, while average FPS still looks fine. Therefore it is worth to visualize duration of individual frames on a graph, to see if they are stable.

One idea for such a graph is to draw a line connecting data points (frames), where X axis is the frame index and Y axis is the frame duration (dt), like on these pictures: "GPU Reviews: Why Frame Time Analysis is important", page 3. If such graph is shown in real time, there is one problem with it: it doesn't move at a constant pace, as the horizontal axis is expressed in frames, not seconds, so an exceptionally long frame will have the same width as super short frame. As the result, the graph will move faster the higher is the framerate.

Source: "GPU Reviews: Why Frame Time Analysis is important", page 3

A better idea might be to move data points horizontally with time, so that a very long frame will generate a spike on the graph with previous point many pixels away on the horizontal axis. This is what AMD OCAT tool seems to be doing. However, it results in a long, oblique line on the graph.

Overlay shown by OCAT tool

Some time ago I came up with another kind of graph. It shows every frame as a rectangle, with all its parameters: width, height, and color, dependent on the frame duration:

I think that with this kind of graph, both average framerate and outstanding extra-long frames are clearly visible at a first glance. You can see full example source code doing all this, implemented in C++ here: Game.cpp - RegEngine - sawickiap - GitHub. It uses GLM for math functions and Dear ImGui for 2D rendering.

For example, a game with V-sync on, running at steady 60 FPS, has the graph looking like this:

While a heavier GPU workload making the game running at around 38 FPS looks like this. The graph also shows an extra-long frame that froze the entire game because of loading something from the disk, and another hitch caused by pressing PrintScreen key.

Comments | #rendering Share

# A Metric for Memory Fragmentation

Apr 2022

In this article, I would like to discuss the problem of memory fragmentation and propose a formula for calculating a metric telling how badly the memory is fragmented.

Problem statement

The problem can be stated like this:

So it is a standard memory allocation situation. Now, I will explain what do I mean by fragmentation. Fragmentation, for this article, is an unwanted situation where free memory is spread across many small regions in between allocations, as opposed to a single large one. We want to measure it and preferably avoid it because:

A solution to this problem is to perform defragmentation - an operation that moves the allocations to arrange them next to each other. This may require user involvement, as pointers to the allocations will change then. It may also be a time-consuming operation to calculate better places for the allocations and then to copy all their data. It is thus desirable to measure the fragmentation to decide when to perform the defragmentation operation.

Read full entry > | Comments | #gpu #algorithms #optimization Share

# Vulkan Memory Allocator 3.0.0 and D3D12 Memory Allocator 2.0.0

Mar 2022

Yesterday we released new major version of Vulkan Memory Allocator 3.0.0 and D3D12 Memory Allocator 2.0.0, so if you are coding with Vulkan or Direct3D 12, I recommend to take a look at these libraries. Because coding them is part of my job, I won't describe them in detail here, but just refer to my article published on "Announcing Vulkan Memory Allocator 3.0.0 and Direct3D 12 Memory Allocator 2.0.0". Direct links:

Vulkan Memory Allocator

D3D12 Memory Allocator

Comments | #rendering #directx #vulkan #gpu #libraries #productions Share

# Untangling Direct3D 12 Memory Heap Types and Pools

Feb 2022

Those of you who follow my blog can say that I am boring, but I can't help it - somehow GPU memory allocation became my thing, rather than shaders and effects, like most graphics programmers do. Some time ago I've written an article "Vulkan Memory Types on PC and How to Use Them" explaining what memory heaps and types are available on various types of PC GPUs, as visible through Vulkan API. This article is a Direct3D 12 equivalent, in a way.

With expressing memory types as they exist in hardware, D3D12 differs greatly from Vulkan. Vulkan defines a 2-level hierarchy of memory "heaps" and "types". A heap represents a physical piece of memory of a certain size, while a type is a "view" of a specific heap with certain properties, like cached versus uncached. This gives a great flexibility in how different GPUs can express their memory, which makes it hard for the developer to ensure he selects the optimal one on any kind of GPU. Direct3D 12 offers a fixed set of memory types. When creating a buffer or a texture, it usually means selecting one of the 3 standard "heap types":

So far, so good... D3D12 seems to simplify things compared to Vulkan. You can stop here and still develop a decent graphics program, but if you make a game with an open world and want to stream your content in runtime, so you need to check what memory budget is available to your app, or you want to take advantage of integrated graphics where memory is unified, you will find out that things are not that simple in this API. There are 4 different ways that D3D12 calls various memory types and they are not so obvious when we compare systems with discrete versus integrated graphics. The goal of this article is to explain and untangle all this complexity.

Read full entry > | Comments | #directx Share

Pages: 1 2 3 ... 145 >

[Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2022