All blog entries, ordered from most recent. Entry count: 1079.
# When integrated graphics works better
In RPG games the more powerful your character is, the more tough and scary are the monsters you have to fight. I sometimes get a feeling that the same applies to real life - bugs you meet when you are a programmer. I recently blogged about the issue when QueryPerformanceCounter call takes long time. I've just met another weird problem. Here is my story:
I have Lenovo IdeaPad G50-80 (80E502ENPB) laptop. It has switchable graphics: integrated Intel i7-5500U and dedicated AMD Radeon R5 M330. Of course I used to choose AMD dedicated graphics, because it's more powerful. My application is a music visualization program. It renders graphics using Direct3D 11. It uses one
ID3D11Device object and one thread for rendering, but two windows displayed on two outputs: output 1 (laptop screen) contains window with GUI and preview, while output 2 (projector connected via VGA or HDMI) shows main view using borderless, topmost window covering whole screen (but not real fullscreen as in
IDXGISwapChain::SetFullscreenState). I tend to enable V-sync on output 1 (
IDXGISwapChain::Present SyncInterval = 1) and disable it on output 0 (
SyncInterval = 0). My rendering algorithm looks like this:
Loop over frames: Render scene to MainRenderTarget Render MainRenderTarget to OutputBackBuffer, covering whole screen Render MainRenderTarget to PreviewBackBuffer, on a quad Render ImGui to PreviewBackBuffer OutputSwapChain->Present() PreviewSwapChain->Present()
So far I had just one problem with it: my framerate decreased over time. It used to drop very quickly after launching the app from 60 to 30 FPS and stabilize there, but after few hours it was steadily decreasing to 20 FPS or even less. I couldn't identify the reason for it in my code, like a memory leak. It seemed to be related to rendering. I could somehow live with this issue - low framerate was not that noticable.
Suddenly this Thursday, when I wanted to test new version of the program, I realized it hangs after around a minute from launching. It was a strange situation in which the app seemed to be running normally, but it was just not rendering any new frames. I could see it still works by inspecting CPU usage and thread list with Process Hacker. I could minimize its windows or cover them by other windows and they preserved their content after restoring. I even captured trace in GPUView, only to notice that the app is filling DirectX command queue and AMD GPU is working. Still, nothing was rendered.
That was a frightening situation for me, because I need to have it working for this weekend. After I checked that restarting app or the whole system doesn't help, I tried to identify the cause and fix it in various ways:
1. I thought that maybe there is just some bug in the new version of my program, so I launched the previous version - one that successfully worked before, reaching more than 10 hours of uptime. Unfortunately, the problem still occured.
2. I thought that maybe it's a bug in the new AMD graphics driver, so I downloaded and installed previous version, performing "Clean install". It didn't help either.
3. In desperation, I formatted whole hard drive and reinstalled operating system. I planned to to it anyway, because it was a 3-year-old system, upgraded from Windows 8 and I had some other problems with it (that I don't describe here because they were unrelated to graphics). I installed the latest, clean Windows 10 with latest updates and all the drivers. Even that didn't solve my problem. The program still hung soon after every launch.
I finally came up with an idea to switch my app to using Intel integrated graphics. It can be done in Radeon Settings > "Switchable Graphics" tab. In a popup menu for a specific executable, "High Performance" means choosing dedicated AMD GPU and "Power Saving" means choosing integrated Intel GPU. See article Configuring Laptop Switchable Graphics... for details.
It solved my problem! The program not only doesn't hang any longer, but it also maintains stable 60 FPS now (at least it did during my 2h test). Framerate drops only when there is a scene that blends many layers together on a FullHD output - apparently this GPU cannot keep up with drawing so many pixels per second. Anyway, this is the situation where using integrated Intel graphics turns out work better than a faster, dedicated GPU.
I still don't know what is the cause of this strange bug. Is it something in the way my app uses D3D11? Or is it a bug in graphics driver (one of the two I need to have installed)? I'd like to investigate it further when I find some time. For now, I tend to believe that:
- The only thing that might have changed recently and break my app was some Windows updated pushed by Microsoft.
- The two issues: the one that I had before with framerate decreasing over time and the new one with total image freeze are related. They may have something to do with switchable graphics - having two different GPUs in the system, both enabled at the same time. I suspect that maybe when I want to use Radeon, the outputs (or one of them) are connected to Intel anyway, so the image needs to be copied and synchronized with Intel driver.
Update 2018-02-21: Later after I published this post, I tried few other things to fix the problem. For example, I updated AMD graphics driver to latest version 18.2.2. It didn't help. Suddently, the problem disappeared as mysteriously as it appeared. It happened during a single system launch, without a restart. My application was hunging, and later it started working properly. The only thing that I can remember doing in between was downloading and launching UIforETW - a GUI tool for capturing Event Tracing for Windows (ETW) traces, like the ones for GPUView. I know that it automatically installs GPUView and other necessary tools on first launch, so that may have changed something in my system. Either way, now my program works on AMD graphics without a hang, reaching few hours of uptime and maintaining 60 FPS, which only sometimes drops to 30 FPS, but it also go back up.
# 6th tip to understand legacy code
I just watched a video published 2 days ago: 5 Tips to Understand Legacy Code by Jonathan Boccara. I like it a lot, yet I feel that something is missing here. The author gives 5 tips for how can you start figuring out a large codebase that is new to you.
These are all great advices. I agree with them. But computer programs have two aspects: dynamic (the way the code executes over time - algorithms, functions, control flow) and static (the way data are stored in memory - data structures). I have a feeling that these 5 points focus mostly on dynamic aspect, so as an advocate of "data-oriented design" I would add another point:
6. "Core data structure": Find structures, classes, variables, enums, and other definitions that describe the most important data that the program is operating on. For example, if it's a game engine, see how objects of a 3D scene are defined (some abstract base CGameObject class), how they can relate to each other (forming a tree structure or so-called scene graph), what properties do they have (name, ID, position, size, orientation, color), what kinds of them are available (mesh, camera, light). Or if that's a compiler, look for definition of Abstract Syntax Tree (AST) node and enum with list of all opcodes. Draw UML diagram that will show what data types are defined, what member variables do they contain and how do they relate to each other (inheritance, composition). After visualizing and understanding that, it will be much easier to analyze the dynamic aspect - code of algorithms that operate on this data. Together, they form the essence of the program. All the rest are just helpers.
# How to view CHM files on high DPI monitor?
Using monitors with high resolution like 4K, where you need to set DPI scaling other than 100%, is pain in the *** in Windows - it causes trouble with many applications. That’s why I want to stick with FullHD monitors as long as possible. One of the apps that doesn’t scale with DPI is Microsoft’s own viewer for CHM files (Microsoft Compiled HTML Help). CHM is a file format commonly used for software help/documentation. It has been introduced with Windows 98 as a replacement for old HLP (WinHelp). Although we read almost everything online these days, some programs and libraries still use it.
A CHM document is completely unreadable on 4K monitor with 200% DPI scaling:
I searched Google for solution. Some sources say there is Font button in the app’s toolbar that lets increase font size, but it doesn’t work in my case. This page says that availability of this button can be configured when creating CHM file. This page mentions some alternative readers for CHM format (Firefox plugin, as well as standalone app).
I know that apps which misbehave under high DPI can be configured to work in “compatibility mode”, when Windows just rescales their window. I found out that executable for this default CHM reader is c:\Windows\hh.exe, but I couldn’t find this setting in Properties of this file. I thought that maybe it’s because the file is located in system directory and owned by the system with insufficient privileges for administrators and normal users, so I came up with following solution that actually works:
D:\Soft\hh.exe "d:\AGS_SDK-5.1.1\ags_lib\doc\amd_ags.chm"-> document should open in the browser with font size good for reading. Every pixel is just scaled to 2x2 pixels.
# Driver source code is not what you may think
AMD just released Open Source Driver for Vulkan, with source code available on GitHub under MIT license. That’s a good opportunity to explain how drivers source look like. When I was developing graphics driver (I no longer do - I now have quite different position), people kept asking me "How is it like to code in C?" or "How is it like to code in kernel space?", unaware that none of it was true in my case.
Many developers think that coding a driver is some hardcore, low-level stuff. Maybe it is true for some small drivers in embedded systems world. What they may not know is that a modern PC graphics driver (and I bet other kinds of drivers as well) is a very complex beast, with only a small portion of it working in kernel mode and only a small portion (if any) written in plain old C or assembly. Majority of the code is just normal C++ with classes, virtual functions and everything. It is compiled into normal user-mode DLL libraries that get loaded into address space of a game. Sure the code may be a little bit different than standard desktop apps. It may be optimized for performance, as well as memory usage. It may not use exceptions, Boost or STL. It may have to handle out-of-memory errors gracefully. But it’s still a modern, object-oriented code that uses (relatively) new language features like
# Rendering Optimization - My Talk at Warsaw University of Technology
If you happen to be in Warsaw tomorrow (2017-12-13), I'd like to invite you to my talk at Warsaw University of Technology. On the weekly meeting of Polygon group, this time the first talk will be about about artificial intelligence in action games (by Kacpi), followed by mine about rendering optimization. It will be technical, but I think it should be quite easy to understand. I won't show a single line of code. I will just give some tips for getting good performance when rendering 3D graphics on modern GPUs. I will also show some tools that can help with performance profiling. It will be all in Polish. The event starts at 7 p.m. Entrance is free. See also Facebook event. Traditionally after the talks we all go for a beer :)
# Code Europe Conference 2017 Warsaw - Some Random Thoughts
2017-12-07 I've been on Code Europe conference in Warsaw, Poland. Despite happening in just one day, it was a big event, with many talks at the same time, so I needed to choose the ones which seemed most interesting. Some of them were great, some... not that good.
The one that I liked the most was Adam Tornhill talking about "A Crystal Ball To Prioritize Technical Debt". He started by discussing technical debt in general, especially how "interest" accumulates over time, where time could be defined best as a frequency in which developers modify particular file or function. He stated that all metrics for measuring code complexity are equally bad, so the simplest one - number of lines of code - can be successfully used. He then presented a very cool way of visualizing "hot spots" - places that are the biggest pain points and that would benefit most from refactoring. If every circle represents a source file, its radius is its complexity (number of LOC), and the circle is more red the more frequently it was modified, then the files that are both big and red are the clearly visible hotspots.
But then a thought came to my mind: What if an external, well-paid consultant comes in to a software company to do such analysis? He then writes in his report: "After gathering all the data about your project and using sophisticated software tools I found that this particular file and function is very big, sophisticated, modified frequently by developers from different teams and so you should refactor it." Then all the developers of that company are like:
Possibly one of the developers could have a courage to tell the consultant: "You know what? We work with this code every day. We all know it better than you do. Maybe you go speak will our manager and convince him to give us time for that refactoring instead of requesting more and more features implemented or bugs fixed ASAP, which introduces even more hacks to the code. That would be actual useful work."
I liked the presentation of Roel Ezendam from RageSquid about "Applying the programmer mindset throughout your entire game studio". There was a lot about game development, but this talk could be seen in more general context. People tend to look at management, marketing, and other positions as something separate of even opposite to being a developer - a technical person. He showed that running a small company while still being a developer can lead to innovative way of doing things, like developing custom tools to automate certain tasks or make them more convenient (e.g. using Slack webhooks).
I didn't like the presentation of Ahmad Nabil Gohar from IBM "Blockchain.currentState() and How Will it Impact Your Industry?" The content was OK - he mostly explained the idea of blockchain (which I already knew), after which he enumerated many industries that could benefit from using it. But the slides were not prepared in a good way, in my opinion. First of all, there were 120 of them, and they contained a lot of text. Obviously he couldn't explain each one of them to finish his presentation is less than one hour, so he was going very quickly and even skipping some. The slides were also not very readable due to e.g. putting blue text on blue background.
This presentation, as well as some other inspired me to think that there a whole spectrum of types of presentations. I'm talking about both the slides and the speech together. On one end, there is urge to convey as much information as possible, so there are many slides, lots of text, they seem quite boring, the speaker goes very fast and so it's hard to follow him and to remember all of this. It happens when the speaker wants to actually teach people some new subject - a thing impossible to do in just one hour talk, because that's what university courses and books are for.
On the other end there are talks which are more like "shows" - easy and nice, speaker telling a lot of stories and conveying emotions, slides drawing attention thanks to using a lot of pictures and single words. Such presentations are fun, but they don't carry any information - they just leave people feeling good without anything new to take out. It happens especially if a very famous person is invited to talk about anything he wants - it doesn't matter what he says because it's only his name in the agenda that matters.
In my opinion, a good talk is something in between. It should express some idea and communicate it clearly, provide just enough information to understand it, with amount of content and pace of delivery slow enough so it's easy to keep up. Slides should show some meaningful text and pictures, while the speech should augment them with additional information and context.
Besides talks there was also quite big expo with many companies advertising their job offers for developers. Most of them were looking for Java or .NET developers, sometimes also PHP or Node.js. I could feel there how exotic my specialization is. There was one game company, but they make their games in Unity. I found only one company that was looking for a C++ developer - it was Ericsson.
She started from explaining WebSockets. My thought was: "Wow, so it's actually possible to have a persistent connection and use it to send any data, any time, in any direction, without text-based request-response protocol? Then desktop applications are so hipster! They did it before it was cool. Actually they did it like... forever."
But the most shocking for me was hearing that their RPC (Remote Procedure Call) "happens so fast almost like having the function locally". Yeah, right, by sending parameters and receiving results over Internet, where the best latency you can get is few milliseconds... While last week I reinstalled my whole system just because a system function was taking 2.5 microseconds instead of 22 nanoseconds, which was ruining my program.
I'm sorry, I didn't want to sound so negative. I just have a bad mood recently. Overall the conference was very inspiring and though-provoking, which is good. I can recommend it to any developer, no matter what programming language you use.
# When QueryPerformanceCounter call takes long time
QueryPerformanceCounter function is all about measuring time and profiling performance, so I wasn't able to formulate right Google query to find a solution to the problem I had - call to
QueryPerformanceCounter function itself taking too much time. Below I describe what I eventually found out.
It all started from hardware failure. My motherboard stopped working, so I needed to buy a new one (ASRock X370 Killer SLI). I know that normally changing motherboard requires reinstalling Windows, but I tried not to do it. The system didn't want to boot, so I booted the PC using pendrive with Windows installer and launched the repair function. It helped - after that Windows was able to start and everything seemed to work... until I launched the program that I develop on that machine. It was running painfully slow.
I tried different things to find out what is happening. Input/output to hard drive or network was not an issue. GPU performance was also OK. It seemed that the app is just doing its calculations slowly, like the CPU was very slow. I double-checked actual CPU and RAM frequency, but it was OK. Finally I launched sampling profiler (the one embedded in Visual Studio - command: Analyze > Performance Profiler). This way I found that most of the time is spent in function
This WinAPI function is the recommended way to obtain a timestamp in Windows. It's very precise, monotonic, safe to use on multiple cores and threads, it has stable frequency independent of CPU power management or Turbo Boost... It's just great, but in order to meet all these requirements, Windows may use different methods to implement it, as described in article Acquiring high-resolution time stamps. Some of them are fast (just reading TSC register), others are slow (require system call - transition to kernel mode).
I wrote a simple C++ program that tests how long it takes to execute
QueryPerformanceCounter function. You can see the code here: QueryPerformanceCounterTest.cpp and download 64-bit binary here: QueryPerformanceCounterTest.zip. Running this test on two different machines gave following results:
CPU: Intel Core i7-6700K, Motherboard: GIGABYTE Z170-HD3-CF:
> QueryPerformanceCounterTest.exe 1000000000
Executing QueryPerformanceCounter x 1000000000...
According to GetTickCount64 it took 0:00:11.312 (11.312 ns per call)
According to QueryPerformanceCounter it took 0:00:11.314 (11.314 ns per call)
CPU: AMD Ryzen 7 1700X, Motherboard: ASRock X370 Killer SLI (changed from different model without system reinstall):
> QueryPerformanceCounterTest.exe 10000000
Executing QueryPerformanceCounter x 10000000...
According to GetTickCount64 it took 0:00:24.906 (2490.6 ns per call)
According to QueryPerformanceCounter it took 0:00:24.911 (2491.1 ns per call)
As you can see, the function takes 11 nanoseconds on first platform and 2.49 microsenonds (220 times more!) on the second one. This was the cause of slowness of my program. The program calls this function many times.
I tried to fix it and somehow convince Windows to use the fast implementation. I uninstalled and reinstalled motherboard drivers - the latest ones downloaded from manufacturer website. I upgraded and downgraded BIOS to different versions. I booted the system from Windows installation media and "repaired" it again. I restored default settings in UEFI/BIOS and tried to change "ACPI HPET Table" option there to Disabled/Enabled/Auto. Nothing worked. Finally I restored Windows to factory settings (Settings > Update & Security > Recovery > Reset this PC). This solved my problem, but unfortunately it's like reinstalling Windows from scratch - now I need to install and configure all the apps again. After that the function takes 22 ns on this machine.
My conclusions from this "adventure" are twofold:
QueryPerformanceCounterto execute slowly on some platforms, like for 2.5 microseconds. If you call it just once per rendering frame then it doesn't matter, but you shouldn't profile every small portion of your code with it, calling it millions of times.
Update 2017-12-11: A colleague told me that enabling/disabling HPET using "bcdedit" system command could possibly help for that issue.
# Check of CommonLib using PVS-Studio
Courtesy developers of PVS-Studio, I could use this static code analysis tool to check my home projects. I blogged about the tool recently. It's powerful and especially good at finding issues with 32/64-bit portability. Now I analyzed CommonLib - a library I develop and use for many years, which contains various facilities for C++ programming. That allowed me to find and fix many bugs. Here are the most interesting ones I've met: