# Hello World Under the Microscope - New Article Published
A Python program that prints "Hello World" on the console - what can be simpler than this? The entire program is:
Yet, together with my friends, we wrote a long article about it! When the topic is described by two security researchers skilled in reverse engineering and knowledgeable about the internals of Python interpreter, Windows operating system, and its console, together with a graphics programmer that knows how graphics and text get displayed on the screen, from Direct3D API down to the internals of a graphics card and pixels on the screen, the result is an in-depth description of the long journey this simple command makes in a computer.
The article was originally published in Polish in issue 100 (1/2022) of the Programista magazine in February 2022. Now, we prepared an English translation, and we are allowed to publish it for free on the Internet, so here it is: Hello World under the microscope. You can also download the original Polish version as PDF file or order printed version of the magazine.
# DivideRoundingUp Function and the Value of Abstraction
It will be a brief article. Imagine we implement a postprocessing effect that needs to recalculate all the pixels on the screen, reading one input texture as SRV 0 and writing one output texture as UAV 0. We use a compute shader that has
numthreads(8, 8, 1) declared, so each thread group processes 8 x 8 pixels. When looking at various codebases in gamedev, I've seen many times a code similar to this one:
constexpr uint32_t groupSize = 8;
(screenWidth + groupSize - 1) / groupSize,
(screenHeight + groupSize - 1) / groupSize,
It should work fine. We definitely need to align up the number of groups to dispatch, so we don't skip pixels on the right and the bottom edge in case our screen resolution is not a multiply of 8. The reason I don't like this code is that it uses a "trick" that in my opinion should be encapsulated like this:
uint32_t DivideRoundingUp(uint32_t a, uint32_t b)
return (a + b - 1) / b;
Abstraction is the fundamental concept of software engineering. It is hard to work with complex systems without hiding details behind some higher-level abstraction. This idea applies to entire software modules, but also to small, 1-line pieces of code like this one above. For some programmers it might be obvious when seeing
(a + b - 1) / b that we do a division with rounding up, but a junior programmer who just joined the team may not know this trick. By moving it out of view and giving it a descriptive name, we make the code cleaner and easier to understand for everyone. Therefore I think that all small arithmetic or bit tricks like this should be enclosed in a library of functions rather than used inline. Same with the popular formula for checking if a number is a power of two:
bool IsPow2(uint32_t x)
return (x & (x - 1)) == 0;
# D3d12info - Printing D3D12 GPU Information to Console
My next little hobby project is D3d12info. It is a Windows console program that prints all the information it can get about the current GPU installed in the system, as seen through Direct3D 12 API. It also fetches additional information through AMD GPU Services (on AMD cards), NVAPI (on NVIDIA cards), Vulkan, and WinAPI, mostly to identify the current version of the graphics driver and Windows system. I will try to keep it updated to the latest Agility SDK, to query it for support for the latest hardware features of the graphics card.
The tool can be compared to DirectX Caps Viewer you can find in your Windows SDK installation under path "c:\Program Files (x86)\Windows Kits\10\bin\*\x64\dxcapsviewer.exe" in terms of the information extracted from DX12. However, instead of GUI, it provides a command-line interface, which makes it similar to the "vulkaninfo" tool. Information is printed in a human-readable text format by default, but JSON format can be selected by providing
-j parameter, making it suitable for automated processing. Additional command-line parameters are supported, including a choice of the GPU if there are many installed in the system. Launch it with parameter
-h to see the command-line syntax.
In the future, I would like to extend it with a web back-end that would gather a database of various GPUs and driver versions, like Vulkan Hardware Database does for Vulkan, and to make it browsable online. As far as I know, there is no such database for D3D12 at the moment. Best we have right now are the tables about Direct3D Feature Levels on Wikipedia. But that will require a lot of learning from me, as I am not a good web developer, so I will think about it after my vacation :)
# SimplySaveAs - a Small Tool for Perforce Users
In my old article "Tips for Using Perforce" I promised to dedicate a separate article to what I described there in point 10, so here it is. First, let's talk about the problem. When using a version control system, e.g. Git or Perforce, you surely sometimes inspect the history of a file, to see who changed it, when, and what exactly has been changed throughout its previous versions. GUI clients of such systems offer convenient views to compare text files, but sometimes you may just need to save an old version of the file on your disk - not to update it in your main working copy, but to export it to a separate folder.
In some applications, this is easy. For example, Git Extensions, my favorite GUI client for Git, offers File History window that shows revision history of a selected file. In this window, we can right-click on a specific revision from the list and click "Save as" to export that specific version of the file to a new location on disk.
Unfortunately, in Perforce there is no such command. There is History tab that shows the list of revisions of a selected file. It also offers context menu under right mouse button to do something with a selected revision, but among the commands to diff etc. there is no "Save As", only "Open With". This one allows us to choose some application and open the file with it, which might be useful in case of text files or some other documents (e.g. DOCX, PDF) that we just want to preview using their dedicated app. But what if it is a binary file, having some non-standard extension, that we just want to export to disk?
Here is where the little tool I developed might be useful. SimplySaveAs is a Windows program that you can use to "Open With" a file in Perforce. All it does is show a "Save As" window that lets you choose a place and name where the file should be saved on your disk. This way, the external tool provides the command missing in Perforce visual client (P4V).
The program doesn't need any installation. The repository linked above also contains full source code in C++, but all you need to download is just the file "SimplySaveAs.exe". You can put it in any location on your disk. I like to have a separate directory "C:\PortablePrograms\", where I put all the portable applications that don't need installation, like this one.
First time you want to use it, you need to click on Open With > Choose Application... in Perforce and select "SimplySaveAs.exe" from your disk.
On every next use, Perforce will remember this program and show it available in the context menu, so you can just click Open With > SimplySaveAs.
How does it work? As you may know, opening a file with a program actually needs to save the file on a disk somewhere, likely in a temporary folder, and then launching the program with a path to this file passed as a command-line parameter. This is also what Perforce does when we use "Open With" command. So all my program does is ask the user for a target path and then copy the file from the source, temporary location read from the parameter to the target location selected by the user.
# An Idea for Visualization of Frame Times
In real-time graphics applications like games, we usually measure performance as the average number of frames per second (FPS). Showing this average is a good estimate of how well the application performs, how heavy is the per-frame workload, how fast is the system where it executes, and, most importantly, whether the performance suffices for showing a smooth, good looking animation, as opposed to a "slideshow". But this is not a complete story. If some frames take an exceptionally long time, then even if others are very short, an unpleasant hitching may be visible to the player, while average FPS still looks fine. Therefore it is worth to visualize duration of individual frames on a graph, to see if they are stable.
One idea for such a graph is to draw a line connecting data points (frames), where X axis is the frame index and Y axis is the frame duration (dt), like on these pictures: "GPU Reviews: Why Frame Time Analysis is important", page 3. If such graph is shown in real time, there is one problem with it: it doesn't move at a constant pace, as the horizontal axis is expressed in frames, not seconds, so an exceptionally long frame will have the same width as super short frame. As the result, the graph will move faster the higher is the framerate.
A better idea might be to move data points horizontally with time, so that a very long frame will generate a spike on the graph with previous point many pixels away on the horizontal axis. This is what AMD OCAT tool seems to be doing. However, it results in a long, oblique line on the graph.
Overlay shown by OCAT tool
Some time ago I came up with another kind of graph. It shows every frame as a rectangle, with all its parameters: width, height, and color, dependent on the frame duration:
frameWidth = dt / (1/120). Rectangle left and right edges also need to be aligned to
ceil(), respectively, so that every frame is visible as at least 1-pixel wide.
frameHeightFactor = (log2(dt) - log2(1/120)) / (log2(1/15) - log2(1/120)). This factor then needs to be clamped to 0..1 and stretched to some range of minimum..maximum heights, depending on the intended looks of the graph, e.g. 2..64 pixels. This way, frames of a game running at 120 FPS will have minimum height, 60 FPS will be at 33%, 30 FPS - 66%, and for 15 FPS or less they will have maximum height.
I think that with this kind of graph, both average framerate and outstanding extra-long frames are clearly visible at a first glance. You can see full example source code doing all this, implemented in C++ here: Game.cpp - RegEngine - sawickiap - GitHub. It uses GLM for math functions and Dear ImGui for 2D rendering.
For example, a game with V-sync on, running at steady 60 FPS, has the graph looking like this:
While a heavier GPU workload making the game running at around 38 FPS looks like this. The graph also shows an extra-long frame that froze the entire game because of loading something from the disk, and another hitch caused by pressing PrintScreen key.
# A Metric for Memory Fragmentation
In this article, I would like to discuss the problem of memory fragmentation and propose a formula for calculating a metric telling how badly the memory is fragmented.
The problem can be stated like this:
So it is a standard memory allocation situation. Now, I will explain what do I mean by fragmentation. Fragmentation, for this article, is an unwanted situation where free memory is spread across many small regions in between allocations, as opposed to a single large one. We want to measure it and preferably avoid it because:
VirtualAlloc) and sub-allocating them for the user's allocation requests, high fragmentation my require allocating another block to satisfy the request, making the program using more system memory than really needed.
A solution to this problem is to perform defragmentation - an operation that moves the allocations to arrange them next to each other. This may require user involvement, as pointers to the allocations will change then. It may also be a time-consuming operation to calculate better places for the allocations and then to copy all their data. It is thus desirable to measure the fragmentation to decide when to perform the defragmentation operation.
# Vulkan Memory Allocator 3.0.0 and D3D12 Memory Allocator 2.0.0
Yesterday we released new major version of Vulkan Memory Allocator 3.0.0 and D3D12 Memory Allocator 2.0.0, so if you are coding with Vulkan or Direct3D 12, I recommend to take a look at these libraries. Because coding them is part of my job, I won't describe them in detail here, but just refer to my article published on GPUOpen.com: "Announcing Vulkan Memory Allocator 3.0.0 and Direct3D 12 Memory Allocator 2.0.0". Direct links:
Vulkan Memory Allocator
D3D12 Memory Allocator
# Untangling Direct3D 12 Memory Heap Types and Pools
Those of you who follow my blog can say that I am boring, but I can't help it - somehow GPU memory allocation became my thing, rather than shaders and effects, like most graphics programmers do. Some time ago I've written an article "Vulkan Memory Types on PC and How to Use Them" explaining what memory heaps and types are available on various types of PC GPUs, as visible through Vulkan API. This article is a Direct3D 12 equivalent, in a way.
With expressing memory types as they exist in hardware, D3D12 differs greatly from Vulkan. Vulkan defines a 2-level hierarchy of memory "heaps" and "types". A heap represents a physical piece of memory of a certain size, while a type is a "view" of a specific heap with certain properties, like cached versus uncached. This gives a great flexibility in how different GPUs can express their memory, which makes it hard for the developer to ensure he selects the optimal one on any kind of GPU. Direct3D 12 offers a fixed set of memory types. When creating a buffer or a texture, it usually means selecting one of the 3 standard "heap types":
D3D12_HEAP_TYPE_DEFAULTis intended for resources that are directly and frequently accessed by the GPU, so all render-target, depth-stencil textures, other textures, vertex and index buffers used for rendering etc. go there. It typically ends up in the memory of the graphics card.
D3D12_HEAP_TYPE_UPLOADrepresents a memory that we can
Mapand fill in from the CPU and then copy or directly read on the GPU. It must be created and stay in
D3D12_RESOURCE_STATE_GENERIC_READ. It ends up in system RAM and is uncached but write-combined, which means it should only be written, never read.
D3D12_HEAP_TYPE_READBACKrepresents the least frequently used type of memory that the GPU can write or copy to, while the CPU can
Mapand read. It must be created in
D3D12_RESOURCE_STATE_COPY_DEST. It ends up in system RAM and is cached, which means random access from our CPU code is fine for it.
So far, so good... D3D12 seems to simplify things compared to Vulkan. You can stop here and still develop a decent graphics program, but if you make a game with an open world and want to stream your content in runtime, so you need to check what memory budget is available to your app, or you want to take advantage of integrated graphics where memory is unified, you will find out that things are not that simple in this API. There are 4 different ways that D3D12 calls various memory types and they are not so obvious when we compare systems with discrete versus integrated graphics. The goal of this article is to explain and untangle all this complexity.
.net algorithms books c++ career commonlib competitions compo demoscene directx engine events flash fractals gallery games ggj gpu graphics gui hardware homepage humor ideas igk java libraries life linux literature math mobile music networking optimization origami philosophy photography politics productions psytrance redmond rendering scripts shopping software software engineering stl studies teaching tools video visual studio vj vulkan warsztat web webdev winapi windows