# SimplySaveAs - a Small Tool for Perforce Users

Jun 2022

In my old article "Tips for Using Perforce" I promised to dedicate a separate article to what I described there in point 10, so here it is. First, let's talk about the problem. When using a version control system, e.g. Git or Perforce, you surely sometimes inspect the history of a file, to see who changed it, when, and what exactly has been changed throughout its previous versions. GUI clients of such systems offer convenient views to compare text files, but sometimes you may just need to save an old version of the file on your disk - not to update it in your main working copy, but to export it to a separate folder.

In some applications, this is easy. For example, Git Extensions, my favorite GUI client for Git, offers File History window that shows revision history of a selected file. In this window, we can right-click on a specific revision from the list and click "Save as" to export that specific version of the file to a new location on disk.

Unfortunately, in Perforce there is no such command. There is History tab that shows the list of revisions of a selected file. It also offers context menu under right mouse button to do something with a selected revision, but among the commands to diff etc. there is no "Save As", only "Open With". This one allows us to choose some application and open the file with it, which might be useful in case of text files or some other documents (e.g. DOCX, PDF) that we just want to preview using their dedicated app. But what if it is a binary file, having some non-standard extension, that we just want to export to disk?

Here is where the little tool I developed might be useful. SimplySaveAs is a Windows program that you can use to "Open With" a file in Perforce. All it does is show a "Save As" window that lets you choose a place and name where the file should be saved on your disk. This way, the external tool provides the command missing in Perforce visual client (P4V).

The program doesn't need any installation. The repository linked above also contains full source code in C++, but all you need to download is just the file "SimplySaveAs.exe". You can put it in any location on your disk. I like to have a separate directory "C:\PortablePrograms\", where I put all the portable applications that don't need installation, like this one.

First time you want to use it, you need to click on Open With > Choose Application... in Perforce and select "SimplySaveAs.exe" from your disk.

On every next use, Perforce will remember this program and show it available in the context menu, so you can just click Open With > SimplySaveAs.

How does it work? As you may know, opening a file with a program actually needs to save the file on a disk somewhere, likely in a temporary folder, and then launching the program with a path to this file passed as a command-line parameter. This is also what Perforce does when we use "Open With" command. So all my program does is ask the user for a target path and then copy the file from the source, temporary location read from the parameter to the target location selected by the user.

Comments | #tools #productions Share

# An Idea for Visualization of Frame Times

May 2022

In real-time graphics applications like games, we usually measure performance as the average number of frames per second (FPS). Showing this average is a good estimate of how well the application performs, how heavy is the per-frame workload, how fast is the system where it executes, and, most importantly, whether the performance suffices for showing a smooth, good looking animation, as opposed to a "slideshow". But this is not a complete story. If some frames take an exceptionally long time, then even if others are very short, an unpleasant hitching may be visible to the player, while average FPS still looks fine. Therefore it is worth to visualize duration of individual frames on a graph, to see if they are stable.

One idea for such a graph is to draw a line connecting data points (frames), where X axis is the frame index and Y axis is the frame duration (dt), like on these pictures: "GPU Reviews: Why Frame Time Analysis is important", page 3. If such graph is shown in real time, there is one problem with it: it doesn't move at a constant pace, as the horizontal axis is expressed in frames, not seconds, so an exceptionally long frame will have the same width as super short frame. As the result, the graph will move faster the higher is the framerate.

Source: "GPU Reviews: Why Frame Time Analysis is important", page 3

A better idea might be to move data points horizontally with time, so that a very long frame will generate a spike on the graph with previous point many pixels away on the horizontal axis. This is what AMD OCAT tool seems to be doing. However, it results in a long, oblique line on the graph.

Overlay shown by OCAT tool

Some time ago I came up with another kind of graph. It shows every frame as a rectangle, with all its parameters: width, height, and color, dependent on the frame duration:

  • Rectangle width is proportional to dt, so that the graph is moving at a constant pace independently from average framerate or hitches. If we expect 120 FPS to be the maximum framerate and we want to see the frames as 1-pixel wide in this case, we can calculate frameWidth = dt / (1/120). Rectangle left and right edges also need to be aligned to floor() and ceil(), respectively, so that every frame is visible as at least 1-pixel wide.
  • Rectangle height is proportional to a base 2 logarithm of dt, so that important "milestones" in achieved framerate result in evenly spaced frame heights. If the maximum frame time above which we don't need to see the difference because the frame is too long anyway is 1/15 s, we can calculate: frameHeightFactor = (log2(dt) - log2(1/120)) / (log2(1/15) - log2(1/120)). This factor then needs to be clamped to 0..1 and stretched to some range of minimum..maximum heights, depending on the intended looks of the graph, e.g. 2..64 pixels. This way, frames of a game running at 120 FPS will have minimum height, 60 FPS will be at 33%, 30 FPS - 66%, and for 15 FPS or less they will have maximum height.
  • Rectangle color is interpolated from a gradient spanning between points: blue at 120 FPS, green at 60 FPS, yellow at 30 FPS, red at 15 FPS.

I think that with this kind of graph, both average framerate and outstanding extra-long frames are clearly visible at a first glance. You can see full example source code doing all this, implemented in C++ here: Game.cpp - RegEngine - sawickiap - GitHub. It uses GLM for math functions and Dear ImGui for 2D rendering.

For example, a game with V-sync on, running at steady 60 FPS, has the graph looking like this:

While a heavier GPU workload making the game running at around 38 FPS looks like this. The graph also shows an extra-long frame that froze the entire game because of loading something from the disk, and another hitch caused by pressing PrintScreen key.

Comments | #rendering Share

# A Metric for Memory Fragmentation

Apr 2022

In this article, I would like to discuss the problem of memory fragmentation and propose a formula for calculating a metric telling how badly the memory is fragmented.

Problem statement

The problem can be stated like this:

  • We consider a linear, addressable memory with a limited capacity.
  • We develop some code that allocates and frees parts of this memory by request from the user (a program that is using our library).
  • Allocations can be of arbitrary size, varying from single bytes to megabytes or even gigabytes.
  • "Allocate" and "free" requests can happen in random order.
  • Allocations cannot overlap. Each one must have its own memory region.
  • Each allocation must be placed in a continuous region of memory. It cannot be made of multiple disjoint regions.
  • Once created, an allocation cannot be moved to a different place. Its region must be considered as occupied until the allocation is freed.

So it is a standard memory allocation situation. Now, I will explain what do I mean by fragmentation. Fragmentation, for this article, is an unwanted situation where free memory is spread across many small regions in between allocations, as opposed to a single large one. We want to measure it and preferably avoid it because:

  • It increases a chance that some future allocation couldn't be made, even with sufficient total amount of free memory, because no large enough free region could be found to satisfy the request.
  • When talking about the entire memory available to a program, this can lead to program failure, crash, or some undefined behavior, depending on how well the program handles memory allocation failures.
  • When talking about acquiring large blocks of memory from the operating system (e.g. with WinAPI function VirtualAlloc) and sub-allocating them for the user's allocation requests, high fragmentation my require allocating another block to satisfy the request, making the program using more system memory than really needed.
  • Similarly, when most allocations are freed, our code may not release memory blocks to the system, as there are few small allocations spread across them.

A solution to this problem is to perform defragmentation - an operation that moves the allocations to arrange them next to each other. This may require user involvement, as pointers to the allocations will change then. It may also be a time-consuming operation to calculate better places for the allocations and then to copy all their data. It is thus desirable to measure the fragmentation to decide when to perform the defragmentation operation.

Read full entry > | Comments | #gpu #algorithms #optimization Share

# Vulkan Memory Allocator 3.0.0 and D3D12 Memory Allocator 2.0.0

Mar 2022

Yesterday we released new major version of Vulkan Memory Allocator 3.0.0 and D3D12 Memory Allocator 2.0.0, so if you are coding with Vulkan or Direct3D 12, I recommend to take a look at these libraries. Because coding them is part of my job, I won't describe them in detail here, but just refer to my article published on "Announcing Vulkan Memory Allocator 3.0.0 and Direct3D 12 Memory Allocator 2.0.0". Direct links:

Vulkan Memory Allocator

D3D12 Memory Allocator

Comments | #rendering #directx #vulkan #gpu #libraries #productions Share

# Untangling Direct3D 12 Memory Heap Types and Pools

Feb 2022

Those of you who follow my blog can say that I am boring, but I can't help it - somehow GPU memory allocation became my thing, rather than shaders and effects, like most graphics programmers do. Some time ago I've written an article "Vulkan Memory Types on PC and How to Use Them" explaining what memory heaps and types are available on various types of PC GPUs, as visible through Vulkan API. This article is a Direct3D 12 equivalent, in a way.

With expressing memory types as they exist in hardware, D3D12 differs greatly from Vulkan. Vulkan defines a 2-level hierarchy of memory "heaps" and "types". A heap represents a physical piece of memory of a certain size, while a type is a "view" of a specific heap with certain properties, like cached versus uncached. This gives a great flexibility in how different GPUs can express their memory, which makes it hard for the developer to ensure he selects the optimal one on any kind of GPU. Direct3D 12 offers a fixed set of memory types. When creating a buffer or a texture, it usually means selecting one of the 3 standard "heap types":

  • D3D12_HEAP_TYPE_DEFAULT is intended for resources that are directly and frequently accessed by the GPU, so all render-target, depth-stencil textures, other textures, vertex and index buffers used for rendering etc. go there. It typically ends up in the memory of the graphics card.
    D3D12_HEAP_TYPE_UPLOAD represents a memory that we can Map and fill in from the CPU and then copy or directly read on the GPU. It must be created and stay in D3D12_RESOURCE_STATE_GENERIC_READ. It ends up in system RAM and is uncached but write-combined, which means it should only be written, never read.
    D3D12_HEAP_TYPE_READBACK represents the least frequently used type of memory that the GPU can write or copy to, while the CPU can Map and read. It must be created in D3D12_RESOURCE_STATE_COPY_DEST. It ends up in system RAM and is cached, which means random access from our CPU code is fine for it.

So far, so good... D3D12 seems to simplify things compared to Vulkan. You can stop here and still develop a decent graphics program, but if you make a game with an open world and want to stream your content in runtime, so you need to check what memory budget is available to your app, or you want to take advantage of integrated graphics where memory is unified, you will find out that things are not that simple in this API. There are 4 different ways that D3D12 calls various memory types and they are not so obvious when we compare systems with discrete versus integrated graphics. The goal of this article is to explain and untangle all this complexity.

Read full entry > | Comments | #directx Share

# Direct3D 12: Long Way to Access Data

Jan 2022

One reason the new graphics APIs – Direct3D 12 and Vulkan – are so complicated is that there are so many levels of indirection in accessing data. Having some data prepared in a normal C++ CPU variable, we have to go a long way before we can access it in a shader. Some time ago I’ve written an article: “Vulkan: Long way to access data”. Because I have a privilege of working with both APIs in my everyday job, I thought I could also write a D3D12 equivalent, so here it is. If you are a graphics or game programmer who already know the basics of D3D12, but may feel some confusion about all the concepts defined there, this article is for you.

Here is a diagram, followed by an explanation of all its elements:

Read full entry > | Comments | #directx Share

# First Look at New D3D12 Enhanced Barriers

Dec 2021

This will be pretty advanced or at least intermediate article. It assumes you know Direct3D 12 API. Some references to Vulkan may also appear. I am writing it because I just found out that yesterday Microsoft announced an upcoming big change in D3D12: Enhanced Barriers. It will be an addition to the API that provides a new way to do barriers. Considering my professional interests, this looks very important to me and also quite revolutionary. This article summarizes my first look and my thoughts about this new addition to the API or, speaking in terms of modern internet, my "unboxing" or "reaction" ;)

Bill Kristiansen, the author of the article linked above, written that currently only the software-simulated WARP device supports the new enhanced barriers. Support in real GPU drivers will come at later time. The new barriers can replace the old way of doing them, but both will still be available and can also be mixed in one application. Which means this is not as big revolution to turn our DirectX development upside down - we can switch to them gradually. For now we can just prepare ourselves for the future by studying the interface (which I do in this article) and testing some code using WARP device.

UPDATE 2021-12-10: I just learned that Microsoft actually did publish a documentation of the new API: Enhanced Barriers @ DirectX-Specs, so I recommend to go see it before reading this article.

Read full entry > | Comments | #directx #vulkan #gpu Share

# Understanding Graphs in GPUView and RGP

Nov 2021

When optimizing performance of a game or some other program, the most important thing is to get hard data first – to profile it using some tools, to see what is happening and where to focus attention. There are many profiling tools available. When talking about graphics, we realize that GPU is really a co-processor that can execute submitted work at its own pace, therefore GPU profiling tools offer a specific type of graph to visualize it. In this article, I will explain how to read this type of graph.

Let's take Radeon GPU Profiler (RGP) as an example. This program is available for free and is compatible with AMD graphics cards. It can capture data from programs that use Direct3D 12 or Vulkan. When we open a capture file and go to Overview > Frame summary tab, we can see a graph like this one:

It may look scary at first glance, but don't worry and stay with me. I will explain everything step-by-step. I don't know if there is any name for this type of graph, so let's call it a "queue graph" because it shows a queue of tasks submitted to the graphics card and executed by it.

The horizontal axis is time, passing in the right direction at a constant pace. The vertical axis is the queue, with the front of the queue on the bottom and items enqueued later stacked on top.

At each point in time, the item on the bottom row is the one currently executing on the GPU. Everything above this row is waiting for its turn. It means that from the graph we can see and measure when a certain piece of work (like D3D12 ExecuteCommandLists call in this example) was enqueued, when it started executing. and how long it took to execute it. The width of the bottom block represents the amount of time that was required to execute. Note that the work item going “down the stairs” has no meaning in itself. It just means something in front of it finished, so the queue ahead is shorter. Only when it ends up in the bottom row, it really starts executing.

Another thing to note is that some items wait in the queue but don't take any significant time to execute. These are simple and quick commands, like the green call to the Signal function marked here. When everything in front of it completes, it also completes in no time.

We can make more observations from this graph if we consider the fact that games work with frames, each frame executes commands to draw the whole image from clearing background through 3D objects to UI and finishes with a call to the Present function, marked here in brown color. By looking for this type of item, we can conclude when a new frame begins. For example, in the point "A" the GPU is still executing commands of frame N, while we have all commands for the next frame N+1 enqueued, including its next Present, and also the commands for frame N+2 are stacking up at the end of the queue. Thus, we can expect the game to have 2 frames of latency in displaying the image.

The same type of graph is used by GPUView - a free tool from Microsoft that can record and display what is happening in the system on a very low level. (The linked article is very old - right now the way to install the tool is to grab Windows Assessment and Deployment Kit (Windows ADK) and a convenient UI for it is UIforETW). As you can see here, both "3D Hardware Queue" of my graphics card and software "Device Context" of a running game show packets of work submitted for rendering.

One important piece of information that we can extract from this graph is that GPU is not busy 100% of the time. GPUView actually shows the number on the right, which is 77.89% for the current view. It means the game is not GPU-bound. Reducing graphics quality settings would not increase framerate (FPS). This often happens when the game does some heavy computations on the CPU or when it reaches 60 FPS and we have V-sync enabled. Here we have the latter case, as we can see moments of vertical synchronization marked as blue lines, while rendering each frame seems to be blocked until that moment.

Note the graph described here is not the same as flame graphs or flame charts, which show a hierarchy of nested things, not a queue. For example, a call stack of function calls.

Comments | #optimization #tools Share

Older entries >


Pinboard Bookmarks



Blog Tags

[Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2022