Entries for tag "tools", ordered from most recent. Entry count: 72.
# Understanding Graphs in GPUView and RGP
When optimizing performance of a game or some other program, the most important thing is to get hard data first - to profile it using some tools, to see what is happening and where to focus attention. There are many profiling tools available. When talking about graphics, we realize that GPU is really a coprocessor that can execute submitted work at its own pace. Therefore GPU profiling tools offer a specific type of graph to visualize it. In this article, I will explain how to read this type of graph.
Let's take Radeon GPU Profiler (RGP) as an example. This program is available for free and is compatible with AMD graphics cards. It can capture data from programs that use Direct3D 12 or Vulkan. When we open a capture file and go to Overview > Frame summary tab, we can see a graph like this one:
It may look scary at first glance, but don't worry and stay with me. I will explain everything step-by-step. I don't know if there is any name for this type of graph, so let's call it a "queue graph" because it shows a queue of tasks submitted to the graphics card and executed by it. Here is the way to read it: Horizontal axis is time, passing in the right direction at a constant pace. Vertical axis is the queue, with the front of the queue on the bottom and items enqueued later stacked on top.
At each point in time, the item on the bottom row is the one currently executing on the GPU. Everything above this row is waiting for its turn. It means that from the graph we can see and measure when a certain piece of work (like D3D12
ExecuteCommandLists call in this example) was enqueued, when it started executing and how long it took to execute it, which we can see by the width of the bottom block. Note that the work item going "down the stairs" has no meaning in itself. It just means something in front of it finished, so the queue ahead is shorter. Only when it ends up in the bottom row, it really starts executing.
Another thing to note is that some items wait in the queue but don't take any significant time to execute. These are simple and quick commands, like the green call to
Signal function marked here. When everything in front of it completes, it also completes in no time.
We can make more observations from this graph if we consider the fact that games work with frames, each frame executes commands to draw the whole image from clearing background through 3D objects to UI and finishes with a call to
Present function, marked here in brown color. By looking for this type of item, we can conclude when a new frame begins. For example, in the point "A" the GPU is still executing commands of frame N, while we have all commands for the next frame N+1 enqueued, including its next
Present, and the commands for frame N+2 are stacking up at the end of the queue. Thus, we can expect the game to have 2 frames of latency in displaying the image.
Same type of graph is used by GPUView - a free tool from Microsoft that can record and display what is happening in the system on a very low level. (The linked article is very old - right now the way to install the tool is to grab Windows Assessment and Deployment Kit (Windows ADK) and a convenient UI for it is UIforETW). As you can see here, both "3D Hardware Queue" of my graphics card and software "Device Context" of a running game show packets of work submitted for rendering.
One important information that we can extract from this graph is that GPU is not busy 100% of the time. GPUView actually shows the number on the right, which is 77.89% for the current view. It means the game is not GPU-bound. Reducing graphics quality settings would not increase framerate (FPS). This often happens when the game does some heavy computations on the CPU or when it reaches 60 FPS and we have V-sync enabled. Here we have the latter case, as we can see moments of vertical synchronization marked as blue lines, while rendering each frame seems to be blocked until that moment.
Note the graph described here is not the same as "flame graph", which shows a hierarchy of nested things on a timeline, e.g. call stack of function calls.
# Tips for Using Perforce
Version Control Systems are tools that every programmer should use. Among them, Git is probably the most popular one. Some companies use Perforce instead. Whether it is better or worse is hard to tell, but it has its advantages that make it indispensable in some types of projects, like game development. Perforce handles large binary files very well. Even if the files have tens or a hundred of gigabytes, it still works fine. I talk about the size of one local copy here, not the entire repository on the server.
From user’s perspective, Perforce differs greatly from Git or SVN. Not only commands are named differently (e.g. there is “Submit” instead of “Commit”), but the whole concept of “changelists” is something that needs to be well understood to be used efficiently. While working with Perforce for many years in different companies and projects, I learned some good practices that I would like to share here. Writing them down was difficult as they seem obvious to me, but hopefully some of them are not obvious to you so you will learn something new.
1. Paste paths to address bar
Let’s start with a simple one. Perforce window has a text box on the top that resembles address bar in web browsers. It shows the path of the currently selected file or directory in Depot or Workspace tab. It can also accept input.
When you work on some file in another tool and you want to jump quickly to it in Perforce, e.g. to check it out, just copy the full path of the file to system clipboard and paste it in this “address bar”. Selection in Workspace tab will switch to it immediately.
# Iteration time is everything
I still remember Demobit 2018 in February in Bratislava, Slovakia. During this demoscene party, one of the talks was given by Matt Swoboda "Smash", author of Notch. Notch is a program that allows to create audio-visual content, like demos or interactive visual shows accompanying concerts, in a visual way - by connecting blocks, somewhat like blueprints in Unreal Engine. (The name not to be confused with nickname of the author of Minecraft.) See also Number one / Another one by CNDC/Fairlight - latest demo made in it.
During his talk, Smash referred to music production. He said that musicians couldn't imagine working without a possibility to instantly hear the effect of changes they make to their project. He said that graphics artists deserve same level of interactivity - WYSIWYG, instant feedback, without a need for a lengthy "build" or "render". That's why Notch was created. Then I thought: What about programmers? Don't they deserve it too? Shorter iteration times mean better work efficiency and higher quality of the result. Meanwhile, a programmer sometimes has to wait minutes or even hours to be able to test a change in his code, no matter how small it is. I think it's a big problem.
This is exactly what I like about development of desktop Windows applications and games: they can usually be built, ran, and tested locally within few seconds. Same applies to games made in Unity and Unreal Engine - developer can usually hit "Play" button and quickly test his gameplay. It is often not the case with development for smaller devices (like mobile or embedded) or larger (like servers/cloud).
I think that iteration time - time after which we can observe effects of our changes - is critical for developers' work efficiency, as well as their well-being. We programmers should demand better tools. All of us - including low-level C and C++ programmers. Currently we are at the good position in the job market so we can choose companies and projects to work on. Let's use it and vote with our feet. Decision makers and architects of software/hardware platforms may think that developers are smart, so they can work efficiently even in harsh conditions. They forget that wasting developers' precious time means wasting a lot of money, not to mention their frustration. Creating better tools is an investment that will pay off.
Now, whenever I get a job offer for a developer position, I ask two simple questions:
1. What is the typical iteration time, from the moment when I change something in the code, through compilation, deployment, application launch and loading, until I can observe the effect of my change? If the answer is: "Usually it's just a matter of few seconds. Files you changed are recompiled, then launching the app takes few seconds and that's it." - that's fine. But if the answer is more like: "Well, the whole project needs to be rebuilt. You don't do it locally. You shelve your changes in Perforce so that build server picks it and makes the build. The build is then deployed to the target device, which then needs to reboot and load your app. It takes 15-20 minutes." - then it's a NOPE for me.
2. How do you debug the application? Can you make experiments by setting up breakpoints and watching variables in a convenient way? If the answer is: "Yes, we have debugger nicely integrated with Visual Studio/WinDBG/Eclipse/other IDE and we debug whenever we see a problem." - that's fine. But when I hear: "Well, command-line GDB should work with this environment, but to be honest, it's so hard to setup that no one uses it here. We just put debug console prints in the code and recompile it whenever we want to make a debug experiment." - then that's a red light for me.
# New Version of PVS-Studio
PVS-Studio is particularly good at finding issues with code portability between 32-bit and 64-bit. Out of my personal projects, I already ported CommonLib to 64 bits, and RegScript2 is written to support 64 bits from the start, but porting my main app (music visualization program) to 64 bits is a large task that I still have on my TODO list. Even if I know how to write portable code (use size_t not int etc. :) I made first commits to this repository 8 years ago, when my programming knowledge was much smaller, so I'm sure there are many nasty bugs there. Making it working as 64-bit app will be a difficult task and I'm sure PVS-Studio will help me with that. I will share my experiences and conclusions when I eventually do it.
In the meantime, I recommend to check their Blog, where developers of this tool share many valuable information. They also maintain list of articles describing errors they found in open source projects.
# Review: Deleaker - A tool that finds resource leaks
Deleaker is a tool for programmers that finds resource leaks in C++ programs. It's commercial, with free trial and unconditional 30 day money back. Here is my review of this tool. I've tested version 220.127.116.11.
Deleaker is installed as a plugin for Visual Studio, any version from 2005 to 2013. It also works with Visual Studio Community 2013, as this new free version also supports plugins. There is also standalone Deleaker application (see below).
The purpose of this tool is to augment debugging of native C++ programs with the ability to list all resources that are allocated at the moment (heap memory, virtual memory, OLE memory, GDI objects, USER objects, handles) and so to detect resource leaks. Here is how it works:
The interface is very simple - it can be learned in just few minutes. You can build your program and start debugging it by hitting F5, Deleaker is enabled automatically. Now just open dedicated panel (menu Deleaker > Deleaker Window) and there press "Take snapshot" button. You don't even have to pause execution, but of course the button works as well when your program is paused at a breakpoint. After few seconds, the panel is populated with a list of currently allocated resources, with the place from which it was allocated shown in first column.
After selecting one, bottom panel displays full call stack. Clicking in this call stacks navigates to the place in the source code where the allocation is specified. Finally, after program exit, the list is filled with resources that were not freed - these are actual leaks!
You can filter the list by module (EXE or DLL file that made the call) and by resource type (memory, GDI objects etc.). There is also a column with size of the resource and "Hit Count" - number of resources that were allocated by that particular place in the code (e.g. inside a loop) and stay allocated at the moment.
"Show full stack" button is a nice feature. Clicking it displays full call stack, while by default, the stack is stripped from entries that don't come from your code, but from system libraries. For example, above my function with the actual allocation instruction, there is MSVCR120D.dll!operator new, then there is MSVCR120D.dll!malloc etc... until ntdll.dll!RtlAllocateHeap. It's good that the program can ignore such call stack entries. It also entirely ignores allocations made by system modules outside of your code.
Unfortunately it does this only by identifying module that the function comes from and not it's name, so it cannot ignore templates, like these from STL containers. Maybe ignoring functions by name specified as wildcard or regular expression would help, e.g. "std::*" or "std\:\:.+" - just like Visual Studio debugger can step over specified functions, as I described in How to Make Visual Studio Debugger not Step Into STL.
You can press "Take snapshot" multiple times and save the snapshots for later view. (They are just numbered, you cannot give them names.) By the way, Deleaker captures F5 key, so even when during debugging session, if the focus is in Deleaker panel, this button doesn't resume your program, but instead refreshes the list of allocations (takes new snapshot). You can also select two snapshots and compare them. Then you see only resources that were allocated in the right snapshot and not in the left, which can indicate a leak that happened during some time of the program execution.
Besides heap memory allocations, the tool can also detect other types of resources, like GDI objects. Unfortunately not all interesting types of resources are covered. For example, an opened file of type FILE* f = fopen(...) is shown as normal memory allocation and opened file of type HANDLE f = CreateFile(...) is not shown at all, but I guess it must be due to some system internals.
I didn't find a single leak in my main home project, so I created a dedicated, simple program to test if it can really find leaks. I also checked that it works with programs compiled in Release configuration as well.
Aside from being a Visual Studio plugin, Deleaker can also work as standalone Windows application.
Overall, I like the program. If its price is not a problem for you or your company, I think it can be very useful in improving quality of developed software. I especially like the fact that it's so easy to learn and use.
# Rendering Video Special Effects in GLSL
Rendering real-time, hardware accelerated 3D graphics is one aspect of computer graphics, but there are others too. Recently I became interested in video editing. I wanted to add some special effects to a video and was looking for a technology to do that. Of course video editing software usually has some effects built-in, like different filters or transition effects, some borders or gradients. But I wanted something different. If I had and I knew how to use software like Adobe After Effects, I'm sure that would be the best and easiest way to make any effect imaginable. But as I don't, I decided to use what I already know - to write a shader :)
1. To run a shader, some hosting app is needed. Of course I could write one in C++, but for the purpose of this work it was enough to use Live Coding Compo Framework (a demoscene tool created by bonzaj, which was used during last year's WeCan demoparty). This simple and free package contains rendering application and preconfigured Visual Studio solution. Having VS installed (it works with Express version as well), all I needed to do was to edit "Run.bat" file to point to directory with VS installation in my system. Next, I just executed "Run.bat", and two programs were launched. On the left monitor I had fullscreen "Live Coding Preview", on the right: Visual Studio with special solution opened. I could then edit any of the GLSL fragment shaders contained in the solution. Every time I hit Compile (Ctrl+F7), the shader was compiled and displayed in the preview.
2. Being able to render my effect in real-time, next I needed to capture it to a video. Probably the most popular app for this is FRAPS. I ran it, set Video Capture Settings to frame rate that I was going to use in my final video (which was 29.97 fps) and then captured appropriate period of time of rendering my effect, starting and stopping recording with F9 hotkey.
3. Video captured by FRAPS is in full, original resolution and encoded with some strange codec, so next I needed to convert it to desired format. To do this, I used VLC media player. Some may think that it's just a video player, but in fact it's incredibly powerful and flexible video transmitting and processing software. (I once had an opportunity to work with libVLC - its features exposed as C library.) Its greatest advantage is that it has its own collection of codecs, so it doesn't care whether you have appropriate codecs installed in your system. To convert a video file, I selected: Media > Convert / Save..., selected my AVI file captured by FRAPS, pressed "Convert / Save" button, selected Profile: "Video - H.264 + MP3 (MP4)", customized it using "Edit selected profile" image button, selecting: Encapsulation = MP4/MOV, Video codec = MPEG-4 (on Resolution tab, I could also set new resolution to scale the content, my choice was 1280px x 720px), Audio disabled, Subtitles disabled. Then after pressing "Save", selecting path to destination file, pressing "Start" and waiting some time, I had my video converted to more standard MPEG-4 format (and more than 5 times smaller than the original one recorded by FRAPS).
4. Finally I could insert this video onto a new track in my video editing software and enable blending with underlying layer to achieve desired effect (I used "Overlay" blending mode and 50% opacity).
There are some details that I intentionally skipped here (like video bitrate) not to make this post even longer, but I hope you learned something new from it. My effect looked like this, and here is the source code: Low freq fx.glsl
By the way, here is another tutorial about how to make GIF like this from a video (using only free tools this time):
1. To capture video frames as images, use VLC media player:
2. To merge images into animated GIF, use GIMP:
# CppDepend 4 Pro for Free
It looks that they've just released a new major version of CppDepend - great static code analysis tool for C++. What's New in CppDepend 4 page lists several improvements. What is more important though is that they've decided to give away Pro license for free to open source C/C++ contributors. They say:
To apply for this free license, Please make sure that you meet the following criteria:
I think it's worth trying to apply for that. See also my review of CppDepend.
Recently I've played a little bit with CppDepend - a commercial static code analysis tool for C++. Windows and Linux versions are available and you can download 14-days trial version. You can find lots of information about the program on their website, including screenshots, sample reports, and cases studies of Ogre3D and Irrlicht game engines. Here is my brief review.
First thing you have to do is to create a CppDepend project. You can add Visual Studio solutions to it and set parameters about what reports you want. Alternatively, if you don't have one beacuse e.g. you use makefiles, you can use separate tool - ProjectMaker - to manually create and save some virtual .sl "solution" that enlists source files and directories to analyze.
Then the program analyzes your code (internally using Clang) and builds a database about it. It generates report in HTML format, as well as allows browsing gathered data interactively inside the program (or inside Visual Studio if you install appropriate AddIn). Here you can see how it looks like for some of my code, combined The Final Quest 7 and CommonLib. First of all, you can browse tree of projects, namespaces, classes and methods:
A HTML raport is generated with the list of found issues:
You can visualize different kinds of relationships like inheritance or just using one type by another as a graph:
Another way to visualize dependencies is matrix:
Yet another view mode is treemap. Here I displayed methods (grouped in classes and modules) where size of a method is dependent on number of lines of code.
You can perform simple search and sorting of projects, namespaces, types, methods and files by name, size and different other metrics. Matched items are highlighted on the treemap.
Finally, you can issue complex queries using CQLinq - a query language based on C# LINQ syntax. Embedded query editor is available with syntax highlighting, autocompletion and immediate query output as you type.
So what kinds of data does the program gather from your code? A lot of. Even such simple thing as number of lines of code is calculated intelligently. Instead of text-based, these are logical LOC, which count sequence points in code, so they are independent of coding style, like braces placement or spanning function call on several lines of code.
I didn't mess with code metrics before, so it was interesting for me to read what does it mean for a piece of code to be "stable" or "abstract". It turns out that the code is stable if its types are used by a lot of types of third-party modules. On the other hand, code is abstract when it contains a lot of abstract classes and interfaces. Code that is stable but not abstract can be painful to modify. Code that is not stable and very abstract is useless. Sounds like an interesting idea :)
Another interesting metric is Cyclomatic Complexity. It is basically a number of decision that can be taken in a function, that is number of: if, while, for, case, default, continue, goto, catch etc. Lack of Cohesion Of Methods (LCOM) is yet another metric. It can indicate quality of a class. It is low when almost every methods in the class uses every field, which is good. It is high when, for example, every method uses only one field (like when class has only getters and setters), which is bad.
Based on these metrics (and many others) and some predefined rules, a list of issues found in the code is enlisted in the report. Some of them are very valuable, some not so much. For example, code matching the rule "Constructor should not call a virtual methods" is obviosly a bug or at least a bad practice. But the rule "Fields should be declared as private" seems a little too restrictive, especially as it matches also globals like const float PI = 3.14.
Generally, it feels great to have analysis based on both physical aspect (like directory structures, source files, comments) and logical aspect of the code (like class inheritance, public versus private, number of nested loops). It's also great that the program analyzes code on all levels, from whole solution depending on external (and possibly unknown) code like Windows.h, through namespaces, classes and methods, down until analyzing code inside functions, counting number of conditions, loops, local variables and analyzing which classes and methods are used by which.
Static code analysis tool like CppDepend is not one of the tools necessary for programming, like editor, compiler or debugger. But I believe it can be useful in at least following applications:
When thinking about a conclusion, I have this thought based on some blogs posts I've read recently (here is the first one, unfortunately I can't find the other one right now) that there is a spectrum of different types of programmers. On one side, there are these very "good", rockstar programmers who are not as good at teamwork and instead of solving real practical problems, they play around with code, talk about theory (whether algorithms or language standard) and write so sophisticated code (e.g. with elaborate C++ template tricks) that it is hard to read and maintain for others. They don't bother to give their variables some meaningful names or split their code into clear modules and classes. On the other side of the spectrum there is the growing number of bad programmers who graduate computer science because they were told to do so (with a promise for good money, lots of jobs or anything) and have no real talent, passion or even basic willingness to learn this profession. They only glue their code using ready frameworks, design patterns and code found on Google using Ctrl+C Ctrl+V. I can see clear relationship between this spectrum and the seriousness with which we take reports about code metrics like these genetared by CppDepend. I also believe that in both cases the best approach lies somewhere in the middle.
Appendix: Clang Rocks! is an interesting article by Issam Lahlali, CppDepend lead developer, that explains how they use Clang frontend to analyze C++ code in their product.
Update: CppDepend v2017 has been released recently, adding many great features, including the following: