December 2017

# Driver source code is not what you may think

Dec 2017

AMD just released Open Source Driver for Vulkan, with source code available on GitHub under MIT license. That’s a good opportunity to explain how drivers source look like. When I was developing graphics driver (I no longer do - I now have quite different position), people kept asking me "How is it like to code in C?" or "How is it like to code in kernel space?", unaware that none of it was true in my case.

Many developers think that coding a driver is some hardcore, low-level stuff. Maybe it is true for some small drivers in embedded systems world. What they may not know is that a modern PC graphics driver (and I bet other kinds of drivers as well) is a very complex beast, with only a small portion of it working in kernel mode and only a small portion (if any) written in plain old C or assembly. Majority of the code is just normal C++ with classes, virtual functions and everything. It is compiled into normal user-mode DLL libraries that get loaded into address space of a game. Sure the code may be a little bit different than standard desktop apps. It may be optimized for performance, as well as memory usage. It may not use exceptions, Boost or STL. It may have to handle out-of-memory errors gracefully. But it’s still a modern, object-oriented code that uses (relatively) new language features like constexpr or enum class.

Comments | #driver #graphics Share

# Rendering Optimization - My Talk at Warsaw University of Technology

Dec 2017

If you happen to be in Warsaw tomorrow (2017-12-13), I'd like to invite you to my talk at Warsaw University of Technology. On the weekly meeting of Polygon group, this time the first talk will be about about artificial intelligence in action games (by Kacpi), followed by mine about rendering optimization. It will be technical, but I think it should be quite easy to understand. I won't show a single line of code. I will just give some tips for getting good performance when rendering 3D graphics on modern GPUs. I will also show some tools that can help with performance profiling. It will be all in Polish. The event starts at 7 p.m. Entrance is free. See also Facebook event. Traditionally after the talks we all go for a beer :)

Comments | #teaching #graphics #gpu #optimization Share

# Code Europe Conference 2017 Warsaw - Some Random Thoughts

Dec 2017

2017-12-07 I've been on Code Europe conference in Warsaw, Poland. Despite happening in just one day, it was a big event, with many talks at the same time, so I needed to choose the ones which seemed most interesting. Some of them were great, some... not that good.

The one that I liked the most was Adam Tornhill talking about "A Crystal Ball To Prioritize Technical Debt". He started by discussing technical debt in general, especially how "interest" accumulates over time, where time could be defined best as a frequency in which developers modify particular file or function. He stated that all metrics for measuring code complexity are equally bad, so the simplest one - number of lines of code - can be successfully used. He then presented a very cool way of visualizing "hot spots" - places that are the biggest pain points and that would benefit most from refactoring. If every circle represents a source file, its radius is its complexity (number of LOC), and the circle is more red the more frequently it was modified, then the files that are both big and red are the clearly visible hotspots.

But then a thought came to my mind: What if an external, well-paid consultant comes in to a software company to do such analysis? He then writes in his report: "After gathering all the data about your project and using sophisticated software tools I found that this particular file and function is very big, sophisticated, modified frequently by developers from different teams and so you should refactor it." Then all the developers of that company are like:

Possibly one of the developers could have a courage to tell the consultant: "You know what? We work with this code every day. We all know it better than you do. Maybe you go speak will our manager and convince him to give us time for that refactoring instead of requesting more and more features implemented or bugs fixed ASAP, which introduces even more hacks to the code. That would be actual useful work."

I liked the presentation of Roel Ezendam from RageSquid about "Applying the programmer mindset throughout your entire game studio". There was a lot about game development, but this talk could be seen in more general context. People tend to look at management, marketing, and other positions as something separate of even opposite to being a developer - a technical person. He showed that running a small company while still being a developer can lead to innovative way of doing things, like developing custom tools to automate certain tasks or make them more convenient (e.g. using Slack webhooks).

I didn't like the presentation of Ahmad Nabil Gohar from IBM "Blockchain.currentState() and How Will it Impact Your Industry?" The content was OK - he mostly explained the idea of blockchain (which I already knew), after which he enumerated many industries that could benefit from using it. But the slides were not prepared in a good way, in my opinion. First of all, there were 120 of them, and they contained a lot of text. Obviously he couldn't explain each one of them to finish his presentation is less than one hour, so he was going very quickly and even skipping some. The slides were also not very readable due to e.g. putting blue text on blue background.

This presentation, as well as some other inspired me to think that there a whole spectrum of types of presentations. I'm talking about both the slides and the speech together. On one end, there is urge to convey as much information as possible, so there are many slides, lots of text, they seem quite boring, the speaker goes very fast and so it's hard to follow him and to remember all of this. It happens when the speaker wants to actually teach people some new subject - a thing impossible to do in just one hour talk, because that's what university courses and books are for.

On the other end there are talks which are more like "shows" - easy and nice, speaker telling a lot of stories and conveying emotions, slides drawing attention thanks to using a lot of pictures and single words. Such presentations are fun, but they don't carry any information - they just leave people feeling good without anything new to take out. It happens especially if a very famous person is invited to talk about anything he wants - it doesn't matter what he says because it's only his name in the agenda that matters.

In my opinion, a good talk is something in between. It should express some idea and communicate it clearly, provide just enough information to understand it, with amount of content and pace of delivery slow enough so it's easy to keep up. Slides should show some meaningful text and pictures, while the speech should augment them with additional information and context.

Besides talks there was also quite big expo with many companies advertising their job offers for developers. Most of them were looking for Java or .NET developers, sometimes also PHP or Node.js. I could feel there how exotic my specialization is. There was one game company, but they make their games in Unity. I found only one company that was looking for a C++ developer - it was Ericsson.

I could feel the difference between high-performance, native code development and what are now the most popular programming technologies even more during Srushtika Neelakantam's talk "How we invented our own realtime protocol to make the world work faster!" I was hoping to see some really low level, high performance technology there. What I saw instead was a web-based protocol implemented in JavaScript. By realtime she ment data sent to web browser app to be updated without full page reload, like positions of Uber cars on a map.

She started from explaining WebSockets. My thought was: "Wow, so it's actually possible to have a persistent connection and use it to send any data, any time, in any direction, without text-based request-response protocol? Then desktop applications are so hipster! They did it before it was cool. Actually they did it like... forever."

But the most shocking for me was hearing that their RPC (Remote Procedure Call) "happens so fast almost like having the function locally". Yeah, right, by sending parameters and receiving results over Internet, where the best latency you can get is few milliseconds... While last week I reinstalled my whole system just because a system function was taking 2.5 microseconds instead of 22 nanoseconds, which was ruining my program.

I'm sorry, I didn't want to sound so negative. I just have a bad mood recently. Overall the conference was very inspiring and though-provoking, which is good. I can recommend it to any developer, no matter what programming language you use.

Comments | #events Share

# When QueryPerformanceCounter call takes long time

Dec 2017

QueryPerformanceCounter function is all about measuring time and profiling performance, so I wasn't able to formulate right Google query to find a solution to the problem I had - call to QueryPerformanceCounter function itself taking too much time. Below I describe what I eventually found out.

It all started from hardware failure. My motherboard stopped working, so I needed to buy a new one (ASRock X370 Killer SLI). I know that normally changing motherboard requires reinstalling Windows, but I tried not to do it. The system didn't want to boot, so I booted the PC using pendrive with Windows installer and launched the repair function. It helped - after that Windows was able to start and everything seemed to work... until I launched the program that I develop on that machine. It was running painfully slow.

I tried different things to find out what is happening. Input/output to hard drive or network was not an issue. GPU performance was also OK. It seemed that the app is just doing its calculations slowly, like the CPU was very slow. I double-checked actual CPU and RAM frequency, but it was OK. Finally I launched sampling profiler (the one embedded in Visual Studio - command: Analyze > Performance Profiler). This way I found that most of the time is spent in function QueryPerformanceCounter.

This WinAPI function is the recommended way to obtain a timestamp in Windows. It's very precise, monotonic, safe to use on multiple cores and threads, it has stable frequency independent of CPU power management or Turbo Boost... It's just great, but in order to meet all these requirements, Windows may use different methods to implement it, as described in article Acquiring high-resolution time stamps. Some of them are fast (just reading TSC register), others are slow (require system call - transition to kernel mode).

I wrote a simple C++ program that tests how long it takes to execute QueryPerformanceCounter function. You can see the code here: QueryPerformanceCounterTest.cpp and download 64-bit binary here: Running this test on two different machines gave following results:

CPU: Intel Core i7-6700K, Motherboard: GIGABYTE Z170-HD3-CF:

> QueryPerformanceCounterTest.exe 1000000000
Executing QueryPerformanceCounter x 1000000000...
According to GetTickCount64 it took 0:00:11.312 (11.312 ns per call)
According to QueryPerformanceCounter it took 0:00:11.314 (11.314 ns per call)

CPU: AMD Ryzen 7 1700X, Motherboard: ASRock X370 Killer SLI (changed from different model without system reinstall):

> QueryPerformanceCounterTest.exe 10000000
Executing QueryPerformanceCounter x 10000000...
According to GetTickCount64 it took 0:00:24.906 (2490.6 ns per call)
According to QueryPerformanceCounter it took 0:00:24.911 (2491.1 ns per call)

As you can see, the function takes 11 nanoseconds on first platform and 2.49 microsenonds (220 times more!) on the second one. This was the cause of slowness of my program. The program calls this function many times.

I tried to fix it and somehow convince Windows to use the fast implementation. I uninstalled and reinstalled motherboard drivers - the latest ones downloaded from manufacturer website. I upgraded and downgraded BIOS to different versions. I booted the system from Windows installation media and "repaired" it again. I restored default settings in UEFI/BIOS and tried to change "ACPI HPET Table" option there to Disabled/Enabled/Auto. Nothing worked. Finally I restored Windows to factory settings (Settings > Update & Security > Recovery > Reset this PC). This solved my problem, but unfortunately it's like reinstalling Windows from scratch - now I need to install and configure all the apps again. After that the function takes 22 ns on this machine.

My conclusions from this "adventure" are twofold:

  1. It is valid for function QueryPerformanceCounter to execute slowly on some platforms, like for 2.5 microseconds. If you call it just once per rendering frame then it doesn't matter, but you shouldn't profile every small portion of your code with it, calling it millions of times.
  2. Windows 10 still requires reinstallation when changing motherboard. Otherwise, even if it seems to work, you may experience strange issues like this one.

Update 2017-12-11: A colleague told me that enabling/disabling HPET using "bcdedit" system command could possibly help for that issue.

Comments | #winapi #optimization #hardware #windows Share

[Stat] [STAT NO AD] [Download] [Dropbox] [pub] [Mirror]
Copyright © 2004-2018