# Why I Catch Exceptions in Main Function in C++

Jan 2023

Exception handling in C++ is a controversial topic. On one hand, it can be a good means of reporting and handling errors if done correctly. For it to be free from memory leaks, all memory allocations should be wrapped in smart pointers and other acquired resources (opened files and other handles) wrapped in similar RAII objects. On the other hand, it has been proven many times that the exception mechanism in C++ works very slow. Disabling exception handling in C++ compiler options can speed up the program significantly. No wonder that game developers dislike and disable them completely.

Let’s talk about a command-line C++ program that doesn’t need to disable exception handling in compiler options. Even if it doesn’t use exceptions explicitly, some exceptions may occur, thrown by C++ standard library or some third-party libraries. When a C++ exception is thrown and uncaught, program terminates and process exit code is some large negative number. On my system it is -1073740791 = 0xC0000409.

It would be good if the program printed some error message in such case and returned some custom, clearly defined exit code. Therefore, when developing a command-line C++ program, I like to catch and handle exceptions in the main function, like this:

#include <exception>
#include <cstdio>


int ActualProgram(int argc, char** argv) {

int main(int argc, char** argv) {
    try {
        return ActualProgram(argc, argv);
    catch(const std::exception& ex) {
        fprintf(stderr, "ERROR: %s\n", ex.what());
    catch(...) {
        fprintf(stderr, "UNKNOWN ERROR.\n");

Besides that, if you develop for Windows using Visual Studio, there is another, parallel system of throwing and catching exceptions, called Structured Exception Handling (SEH). It allows you to handle “system” errors that are not C++ exceptions and would otherwise terminate your program, even when using code shown above. This kind of error can be memory access violation (using null or incorrect pointer) or integer division by zero, among others. To catch them, you can use the following code:

#include <Windows.h>

int main(int argc, char** argv) {
    __try {
        return main2(argc, argv);
        fprintf(stderr, "STRUCTURED EXCEPTION: 0x%08X.\n",

Few additional notes are needed here. First, SEH __try-__except section cannot exist in one function with C++ try-catch. It is fine, though, to call a function doing one way of error handling from a function doing the other one. Their order is important – C++ exceptions are also caught by SEH __except, but SEH exceptions are not caught by C++ catch. So, to do it properly, you need to make your main function doing SEH __try-__except, which calls some main2 function doing C++ try-catch, which calls ActualProgram – not the other way around.

If you wonder what are the process exit codes returned by default when exceptions are not caught, the answer can be found in the documentation of GetExceptionCode macro and Windows header files. When memory access violation occurs, this function (or the entire process, if SEH exceptions are not handled) returns -1073741819 = 0xC0000005, which matches EXCEPTION_ACCESS_VIOLATION. When a C++ exception is thrown, the code is -1073740791 = 0xC0000409, which is not one of EXCEPTION_ symbols, but I found it defined as STATUS_STACK_BUFFER_OVERRUN (strange…). Maybe it would be a good idea to extend the __except section shown above to decode known exception codes and print their string description.

Finally, you need to know that integer division by zero throws a SEH exception, but floating-point division by zero does not – at least not by default. There is EXCEPTION_FLT_DIVIDE_BY_ZERO and EXCEPTION_INT_DIVIDE_BY_ZERO error code defined, but the default behavior of incorrect floating-point calculations (e.g. division by zero, logarithm of a negative value) is to return special values like Not a Number (NaN) or infinity and proceed with further calculations. This behavior can be changed, as described in “Floating-Point Exceptions”.

Comments | #windows #visual studio #c++ Share

# Hello World Under the Microscope - New Article Published

Oct 2022

A Python program that prints "Hello World" on the console - what can be simpler than this? The entire program is:

print("Hello World")

Yet, together with my friends, we wrote a long article about it! When the topic is described by two security researchers skilled in reverse engineering and knowledgeable about the internals of Python interpreter, Windows operating system, and its console, together with a graphics programmer that knows how graphics and text get displayed on the screen, from Direct3D API down to the internals of a graphics card and pixels on the screen, the result is an in-depth description of the long journey this simple command makes in a computer.

The article was originally published in Polish in issue 100 (1/2022) of the Programista magazine in February 2022. Now, we prepared an English translation, and we are allowed to publish it for free on the Internet, so here it is: Hello World under the microscope. You can also download the original Polish version as PDF file or order printed version of the magazine.

Comments | #rendering #productions Share

# DivideRoundingUp Function and the Value of Abstraction

Sep 2022

It will be a brief article. Imagine we implement a postprocessing effect that needs to recalculate all the pixels on the screen, reading one input texture as SRV 0 and writing one output texture as UAV 0. We use a compute shader that has numthreads(8, 8, 1) declared, so each thread group processes 8 x 8 pixels. When looking at various codebases in gamedev, I've seen many times a code similar to this one:

renderer->SetConstantBuffer(0, &computeConstants);
renderer->BindTexture(0, &inputTexture);
renderer->BindUAV(0, &outputTexture);
constexpr uint32_t groupSize = 8;
    (screenWidth  + groupSize - 1) / groupSize,
    (screenHeight + groupSize - 1) / groupSize,

It should work fine. We definitely need to align up the number of groups to dispatch, so we don't skip pixels on the right and the bottom edge in case our screen resolution is not a multiply of 8. The reason I don't like this code is that it uses a "trick" that in my opinion should be encapsulated like this:

uint32_t DivideRoundingUp(uint32_t a, uint32_t b)
    return (a + b - 1) / b;
    DivideRoundingUp(screenWidth, groupSize),
    DivideRoundingUp(screenHeight, groupSize),

Abstraction is the fundamental concept of software engineering. It is hard to work with complex systems without hiding details behind some higher-level abstraction. This idea applies to entire software modules, but also to small, 1-line pieces of code like this one above. For some programmers it might be obvious when seeing (a + b - 1) / b that we do a division with rounding up, but a junior programmer who just joined the team may not know this trick. By moving it out of view and giving it a descriptive name, we make the code cleaner and easier to understand for everyone. Therefore I think that all small arithmetic or bit tricks like this should be enclosed in a library of functions rather than used inline. Same with the popular formula for checking if a number is a power of two:

bool IsPow2(uint32_t x)
    return (x & (x - 1)) == 0;

Comments | #software engineering #algorithms #c++ Share

# D3d12info - Printing D3D12 GPU Information to Console

Jul 2022

My next little hobby project is D3d12info. It is a Windows console program that prints all the information it can get about the current GPU installed in the system, as seen through Direct3D 12 API. It also fetches additional information through AMD GPU Services (on AMD cards), NVAPI (on NVIDIA cards), Vulkan, and WinAPI, mostly to identify the current version of the graphics driver and Windows system. I will try to keep it updated to the latest Agility SDK, to query it for support for the latest hardware features of the graphics card.

I share it under open-source MIT license. You can see full source code in the GitHub repository and download compiled binary from the Releases tab.

The tool can be compared to DirectX Caps Viewer you can find in your Windows SDK installation under path "c:\Program Files (x86)\Windows Kits\10\bin\*\x64\dxcapsviewer.exe" in terms of the information extracted from DX12. However, instead of GUI, it provides a command-line interface, which makes it similar to the "vulkaninfo" tool. Information is printed in a human-readable text format by default, but JSON format can be selected by providing -j parameter, making it suitable for automated processing. Additional command-line parameters are supported, including a choice of the GPU if there are many installed in the system. Launch it with parameter -h to see the command-line syntax.

In the future, I would like to extend it with a web back-end that would gather a database of various GPUs and driver versions, like Vulkan Hardware Database does for Vulkan, and to make it browsable online. As far as I know, there is no such database for D3D12 at the moment. Best we have right now are the tables about Direct3D Feature Levels on Wikipedia. But that will require a lot of learning from me, as I am not a good web developer, so I will think about it after my vacation :)

Comments | #productions #tools #directx #gpu Share

# SimplySaveAs - a Small Tool for Perforce Users

Jun 2022

In my old article "Tips for Using Perforce" I promised to dedicate a separate article to what I described there in point 10, so here it is. First, let's talk about the problem. When using a version control system, e.g. Git or Perforce, you surely sometimes inspect the history of a file, to see who changed it, when, and what exactly has been changed throughout its previous versions. GUI clients of such systems offer convenient views to compare text files, but sometimes you may just need to save an old version of the file on your disk - not to update it in your main working copy, but to export it to a separate folder.

In some applications, this is easy. For example, Git Extensions, my favorite GUI client for Git, offers File History window that shows revision history of a selected file. In this window, we can right-click on a specific revision from the list and click "Save as" to export that specific version of the file to a new location on disk.

Unfortunately, in Perforce there is no such command. There is History tab that shows the list of revisions of a selected file. It also offers context menu under right mouse button to do something with a selected revision, but among the commands to diff etc. there is no "Save As", only "Open With". This one allows us to choose some application and open the file with it, which might be useful in case of text files or some other documents (e.g. DOCX, PDF) that we just want to preview using their dedicated app. But what if it is a binary file, having some non-standard extension, that we just want to export to disk?

Here is where the little tool I developed might be useful. SimplySaveAs is a Windows program that you can use to "Open With" a file in Perforce. All it does is show a "Save As" window that lets you choose a place and name where the file should be saved on your disk. This way, the external tool provides the command missing in Perforce visual client (P4V).

The program doesn't need any installation. The repository linked above also contains full source code in C++, but all you need to download is just the file "SimplySaveAs.exe". You can put it in any location on your disk. I like to have a separate directory "C:\PortablePrograms\", where I put all the portable applications that don't need installation, like this one.

First time you want to use it, you need to click on Open With > Choose Application... in Perforce and select "SimplySaveAs.exe" from your disk.

On every next use, Perforce will remember this program and show it available in the context menu, so you can just click Open With > SimplySaveAs.

How does it work? As you may know, opening a file with a program actually needs to save the file on a disk somewhere, likely in a temporary folder, and then launching the program with a path to this file passed as a command-line parameter. This is also what Perforce does when we use "Open With" command. So all my program does is ask the user for a target path and then copy the file from the source, temporary location read from the parameter to the target location selected by the user.

Comments | #tools #productions Share

# An Idea for Visualization of Frame Times

May 2022

In real-time graphics applications like games, we usually measure performance as the average number of frames per second (FPS). Showing this average is a good estimate of how well the application performs, how heavy is the per-frame workload, how fast is the system where it executes, and, most importantly, whether the performance suffices for showing a smooth, good looking animation, as opposed to a "slideshow". But this is not a complete story. If some frames take an exceptionally long time, then even if others are very short, an unpleasant hitching may be visible to the player, while average FPS still looks fine. Therefore it is worth to visualize duration of individual frames on a graph, to see if they are stable.

One idea for such a graph is to draw a line connecting data points (frames), where X axis is the frame index and Y axis is the frame duration (dt), like on these pictures: "GPU Reviews: Why Frame Time Analysis is important", page 3. If such graph is shown in real time, there is one problem with it: it doesn't move at a constant pace, as the horizontal axis is expressed in frames, not seconds, so an exceptionally long frame will have the same width as super short frame. As the result, the graph will move faster the higher is the framerate.

Source: "GPU Reviews: Why Frame Time Analysis is important", page 3

A better idea might be to move data points horizontally with time, so that a very long frame will generate a spike on the graph with previous point many pixels away on the horizontal axis. This is what AMD OCAT tool seems to be doing. However, it results in a long, oblique line on the graph.

Overlay shown by OCAT tool

Some time ago I came up with another kind of graph. It shows every frame as a rectangle, with all its parameters: width, height, and color, dependent on the frame duration:

  • Rectangle width is proportional to dt, so that the graph is moving at a constant pace independently from average framerate or hitches. If we expect 120 FPS to be the maximum framerate and we want to see the frames as 1-pixel wide in this case, we can calculate frameWidth = dt / (1/120). Rectangle left and right edges also need to be aligned to floor() and ceil(), respectively, so that every frame is visible as at least 1-pixel wide.
  • Rectangle height is proportional to a base 2 logarithm of dt, so that important "milestones" in achieved framerate result in evenly spaced frame heights. If the maximum frame time above which we don't need to see the difference because the frame is too long anyway is 1/15 s, we can calculate: frameHeightFactor = (log2(dt) - log2(1/120)) / (log2(1/15) - log2(1/120)). This factor then needs to be clamped to 0..1 and stretched to some range of minimum..maximum heights, depending on the intended looks of the graph, e.g. 2..64 pixels. This way, frames of a game running at 120 FPS will have minimum height, 60 FPS will be at 33%, 30 FPS - 66%, and for 15 FPS or less they will have maximum height.
  • Rectangle color is interpolated from a gradient spanning between points: blue at 120 FPS, green at 60 FPS, yellow at 30 FPS, red at 15 FPS.

I think that with this kind of graph, both average framerate and outstanding extra-long frames are clearly visible at a first glance. You can see full example source code doing all this, implemented in C++ here: Game.cpp - RegEngine - sawickiap - GitHub. It uses GLM for math functions and Dear ImGui for 2D rendering.

For example, a game with V-sync on, running at steady 60 FPS, has the graph looking like this:

While a heavier GPU workload making the game running at around 38 FPS looks like this. The graph also shows an extra-long frame that froze the entire game because of loading something from the disk, and another hitch caused by pressing PrintScreen key.

Comments | #rendering Share

# A Metric for Memory Fragmentation

Apr 2022

In this article, I would like to discuss the problem of memory fragmentation and propose a formula for calculating a metric telling how badly the memory is fragmented.

Problem statement

The problem can be stated like this:

  • We consider a linear, addressable memory with a limited capacity.
  • We develop some code that allocates and frees parts of this memory by request from the user (a program that is using our library).
  • Allocations can be of arbitrary size, varying from single bytes to megabytes or even gigabytes.
  • "Allocate" and "free" requests can happen in random order.
  • Allocations cannot overlap. Each one must have its own memory region.
  • Each allocation must be placed in a continuous region of memory. It cannot be made of multiple disjoint regions.
  • Once created, an allocation cannot be moved to a different place. Its region must be considered as occupied until the allocation is freed.

So it is a standard memory allocation situation. Now, I will explain what do I mean by fragmentation. Fragmentation, for this article, is an unwanted situation where free memory is spread across many small regions in between allocations, as opposed to a single large one. We want to measure it and preferably avoid it because:

  • It increases a chance that some future allocation couldn't be made, even with sufficient total amount of free memory, because no large enough free region could be found to satisfy the request.
  • When talking about the entire memory available to a program, this can lead to program failure, crash, or some undefined behavior, depending on how well the program handles memory allocation failures.
  • When talking about acquiring large blocks of memory from the operating system (e.g. with WinAPI function VirtualAlloc) and sub-allocating them for the user's allocation requests, high fragmentation my require allocating another block to satisfy the request, making the program using more system memory than really needed.
  • Similarly, when most allocations are freed, our code may not release memory blocks to the system, as there are few small allocations spread across them.

A solution to this problem is to perform defragmentation - an operation that moves the allocations to arrange them next to each other. This may require user involvement, as pointers to the allocations will change then. It may also be a time-consuming operation to calculate better places for the allocations and then to copy all their data. It is thus desirable to measure the fragmentation to decide when to perform the defragmentation operation.

Read full entry > | Comments | #gpu #algorithms #optimization Share

# Vulkan Memory Allocator 3.0.0 and D3D12 Memory Allocator 2.0.0

Mar 2022

Yesterday we released new major version of Vulkan Memory Allocator 3.0.0 and D3D12 Memory Allocator 2.0.0, so if you are coding with Vulkan or Direct3D 12, I recommend to take a look at these libraries. Because coding them is part of my job, I won't describe them in detail here, but just refer to my article published on "Announcing Vulkan Memory Allocator 3.0.0 and Direct3D 12 Memory Allocator 2.0.0". Direct links:

Vulkan Memory Allocator

D3D12 Memory Allocator

Comments | #rendering #directx #vulkan #gpu #libraries #productions Share

Older entries >




Blog Tags


[Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2022