Blog

# Code Europe Conference 2017 Warsaw - Some Random Thoughts

20:54
Sun
10
Dec 2017

2017-12-07 I've been on Code Europe conference in Warsaw, Poland. Despite happening in just one day, it was a big event, with many talks at the same time, so I needed to choose the ones which seemed most interesting. Some of them were great, some... not that good.

The one that I liked the most was Adam Tornhill talking about "A Crystal Ball To Prioritize Technical Debt". He started by discussing technical debt in general, especially how "interest" accumulates over time, where time could be defined best as a frequency in which developers modify particular file or function. He stated that all metrics for measuring code complexity are equally bad, so the simplest one - number of lines of code - can be successfully used. He then presented a very cool way of visualizing "hot spots" - places that are the biggest pain points and that would benefit most from refactoring. If every circle represents a source file, its radius is its complexity (number of LOC), and the circle is more red the more frequently it was modified, then the files that are both big and red are the clearly visible hotspots.

But then a thought came to my mind: What if an external, well-paid consultant comes in to a software company to do such analysis? He then writes in his report: "After gathering all the data about your project and using sophisticated software tools I found that this particular file and function is very big, sophisticated, modified frequently by developers from different teams and so you should refactor it." Then all the developers of that company are like:

Possibly one of the developers could have a courage to tell the consultant: "You know what? We work with this code every day. We all know it better than you do. Maybe you go speak will our manager and convince him to give us time for that refactoring instead of requesting more and more features implemented or bugs fixed ASAP, which introduces even more hacks to the code. That would be actual useful work."

I liked the presentation of Roel Ezendam from RageSquid about "Applying the programmer mindset throughout your entire game studio". There was a lot about game development, but this talk could be seen in more general context. People tend to look at management, marketing, and other positions as something separate of even opposite to being a developer - a technical person. He showed that running a small company while still being a developer can lead to innovative way of doing things, like developing custom tools to automate certain tasks or make them more convenient (e.g. using Slack webhooks).

I didn't like the presentation of Ahmad Nabil Gohar from IBM "Blockchain.currentState() and How Will it Impact Your Industry?" The content was OK - he mostly explained the idea of blockchain (which I already knew), after which he enumerated many industries that could benefit from using it. But the slides were not prepared in a good way, in my opinion. First of all, there were 120 of them, and they contained a lot of text. Obviously he couldn't explain each one of them to finish his presentation is less than one hour, so he was going very quickly and even skipping some. The slides were also not very readable due to e.g. putting blue text on blue background.

This presentation, as well as some other inspired me to think that there a whole spectrum of types of presentations. I'm talking about both the slides and the speech together. On one end, there is urge to convey as much information as possible, so there are many slides, lots of text, they seem quite boring, the speaker goes very fast and so it's hard to follow him and to remember all of this. It happens when the speaker wants to actually teach people some new subject - a thing impossible to do in just one hour talk, because that's what university courses and books are for.

On the other end there are talks which are more like "shows" - easy and nice, speaker telling a lot of stories and conveying emotions, slides drawing attention thanks to using a lot of pictures and single words. Such presentations are fun, but they don't carry any information - they just leave people feeling good without anything new to take out. It happens especially if a very famous person is invited to talk about anything he wants - it doesn't matter what he says because it's only his name in the agenda that matters.

In my opinion, a good talk is something in between. It should express some idea and communicate it clearly, provide just enough information to understand it, with amount of content and pace of delivery slow enough so it's easy to keep up. Slides should show some meaningful text and pictures, while the speech should augment them with additional information and context.

Besides talks there was also quite big expo with many companies advertising their job offers for developers. Most of them were looking for Java or .NET developers, sometimes also PHP or Node.js. I could feel there how exotic my specialization is. There was one game company, but they make their games in Unity. I found only one company that was looking for a C++ developer - it was Ericsson.

I could feel the difference between high-performance, native code development and what are now the most popular programming technologies even more during Srushtika Neelakantam's talk "How we invented our own realtime protocol to make the world work faster!" I was hoping to see some really low level, high performance technology there. What I saw instead was a web-based protocol implemented in JavaScript. By realtime she ment data sent to web browser app to be updated without full page reload, like positions of Uber cars on a map.

She started from explaining WebSockets. My thought was: "Wow, so it's actually possible to have a persistent connection and use it to send any data, any time, in any direction, without text-based request-response protocol? Then desktop applications are so hipster! They did it before it was cool. Actually they did it like... forever."

But the most shocking for me was hearing that their RPC (Remote Procedure Call) "happens so fast almost like having the function locally". Yeah, right, by sending parameters and receiving results over Internet, where the best latency you can get is few milliseconds... While last week I reinstalled my whole system just because a system function was taking 2.5 microseconds instead of 22 nanoseconds, which was ruining my program.

I'm sorry, I didn't want to sound so negative. I just have a bad mood recently. Overall the conference was very inspiring and though-provoking, which is good. I can recommend it to any developer, no matter what programming language you use.

Comments | #events Share

# When QueryPerformanceCounter call takes long time

13:06
Sun
03
Dec 2017

QueryPerformanceCounter function is all about measuring time and profiling performance, so I wasn't able to formulate right Google query to find a solution to the problem I had - call to QueryPerformanceCounter function itself taking too much time. Below I describe what I eventually found out.

It all started from hardware failure. My motherboard stopped working, so I needed to buy a new one (ASRock X370 Killer SLI). I know that normally changing motherboard requires reinstalling Windows, but I tried not to do it. The system didn't want to boot, so I booted the PC using pendrive with Windows installer and launched the repair function. It helped - after that Windows was able to start and everything seemed to work... until I launched the program that I develop on that machine. It was running painfully slow.

I tried different things to find out what is happening. Input/output to hard drive or network was not an issue. GPU performance was also OK. It seemed that the app is just doing its calculations slowly, like the CPU was very slow. I double-checked actual CPU and RAM frequency, but it was OK. Finally I launched sampling profiler (the one embedded in Visual Studio - command: Analyze > Performance Profiler). This way I found that most of the time is spent in function QueryPerformanceCounter.

This WinAPI function is the recommended way to obtain a timestamp in Windows. It's very precise, monotonic, safe to use on multiple cores and threads, it has stable frequency independent of CPU power management or Turbo Boost... It's just great, but in order to meet all these requirements, Windows may use different methods to implement it, as described in article Acquiring high-resolution time stamps. Some of them are fast (just reading TSC register), others are slow (require system call - transition to kernel mode).

I wrote a simple C++ program that tests how long it takes to execute QueryPerformanceCounter function. You can see the code here: QueryPerformanceCounterTest.cpp and download 64-bit binary here: QueryPerformanceCounterTest.zip. Running this test on two different machines gave following results:

CPU: Intel Core i7-6700K, Motherboard: GIGABYTE Z170-HD3-CF:

> QueryPerformanceCounterTest.exe 1000000000
Executing QueryPerformanceCounter x 1000000000...
According to GetTickCount64 it took 0:00:11.312 (11.312 ns per call)
According to QueryPerformanceCounter it took 0:00:11.314 (11.314 ns per call)

CPU: AMD Ryzen 7 1700X, Motherboard: ASRock X370 Killer SLI (changed from different model without system reinstall):

> QueryPerformanceCounterTest.exe 10000000
Executing QueryPerformanceCounter x 10000000...
According to GetTickCount64 it took 0:00:24.906 (2490.6 ns per call)
According to QueryPerformanceCounter it took 0:00:24.911 (2491.1 ns per call)

As you can see, the function takes 11 nanoseconds on first platform and 2.49 microsenonds (220 times more!) on the second one. This was the cause of slowness of my program. The program calls this function many times.

I tried to fix it and somehow convince Windows to use the fast implementation. I uninstalled and reinstalled motherboard drivers - the latest ones downloaded from manufacturer website. I upgraded and downgraded BIOS to different versions. I booted the system from Windows installation media and "repaired" it again. I restored default settings in UEFI/BIOS and tried to change "ACPI HPET Table" option there to Disabled/Enabled/Auto. Nothing worked. Finally I restored Windows to factory settings (Settings > Update & Security > Recovery > Reset this PC). This solved my problem, but unfortunately it's like reinstalling Windows from scratch - now I need to install and configure all the apps again. After that the function takes 22 ns on this machine.

My conclusions from this "adventure" are twofold:

  1. It is valid for function QueryPerformanceCounter to execute slowly on some platforms, like for 2.5 microseconds. If you call it just once per rendering frame then it doesn't matter, but you shouldn't profile every small portion of your code with it, calling it millions of times.
  2. Windows 10 still requires reinstallation when changing motherboard. Otherwise, even if it seems to work, you may experience strange issues like this one.

Update 2017-12-11: A colleague told me that enabling/disabling HPET using "bcdedit" system command could possibly help for that issue.

Comments | #winapi #optimization #hardware #windows Share

# Check of CommonLib using PVS-Studio

18:26
Mon
13
Nov 2017

Courtesy developers of PVS-Studio, I could use this static code analysis tool to check my home projects. I blogged about the tool recently. It's powerful and especially good at finding issues with 32/64-bit portability. Now I analyzed CommonLib - a library I develop and use for many years, which contains various facilities for C++ programming. That allowed me to find and fix many bugs. Here are the most interesting ones I've met:

Read full entry > | Comments | #commonlib #c++ #pvs-studio Share

# Ported CommonLib to 64 Bits

22:12
Sun
12
Nov 2017

I recently updated my CommonLib library to support 64-bit code. This is a change that I planned for a very long time. I written this code so many years ago that 64-bit compilation on Windows was not popular back then. Today we have opposite situation - a modern Windows application or game could be 64-bit only. I decided to make the library working in both configurations. The way I did it was:

  1. Created 64-bit configurations in Visual Studio project.
  2. Went over all the code. Made changes required to work correctly in 64-bit configuration.
  3. Compiled the code - made sure it's compiling successfully.
  4. Fixed everything that compiler reported as warnings.
  5. Made sure tests are passing.
  6. Scanned the code with PVS-Studio static code analysis tool. I covered this point in a separate blog post.

The most important and also most frequent change I needed to make in the code was to use size_t type for every size (like the number of bytes), index (for indexing arrays), count (number of elements in some collection), length (e.g. number of characters in a string) etc. This is the "native" type intended for that purpose and as such it's 32-bit or 64-bit, depending on configuration. This is also the type returned by sizeof operator and functions like strlen. Before making this change, I carelessly mixed size_t with uint32.

I sometimes use maximum available value (all ones) as a special value, e.g. to communicate "not found". Before the change I used UINT_MAX constant for that. Now I use SIZE_MAX.

I know there is a group of developers who advocate for using signed integers for indices and counts, but I'm among the proponents of unsigned types, just like C++ standard recommends. In those very rare situations where I need a signed size, correct solution is to use ptrdiff_t type. This is the case where I apply the concept of "stride", which I blogged about here (Polish) and here.

I have a hierarchy of "Stream" classes that represent a stream of bytes that can be read from or written to. Derived classes offer unified interface for a buffer in memory, a file and many other. Before the change, all the offsets and sizes in my streams were 32-bit. I needed to make a decision how to change them. My first idea was to use size_t. It seems natural choice for MemoryStream, but it doesn't make sense for FileStream, as files can always exceed 4 GB size, regardless of 32/64-bit configuration of my code. Such files were not supported correctly by this class. I eventually decided to always use 64-bit types for stream size and cursor position.

There are cases where my usage of size_t in the library interface can't work down to the implementation, because underlying API supports only 32-bit sizes. That's the case with zlib (a data compression library that I wrapped into stream classes in files ZlibUtils.*), as well as BSTR (string type used by Windows COM that I wrapped into a class in files BstrString.*). In those cases I just put asserts in the code checking whether actual size doesn't exceed 32-bit maximum before downcasting. I know it's not the safe solution - I should report error there (throw an exception), but well... That's only my personal code anyway :)

BstrString::BstrString(const wchar_t *wcs, size_t wcsLen)
{
    (...)
    assert(wcsLen < (size_t)UINT_MAX);
    Bstr = SysAllocStringLen(wcs, (UINT)wcsLen);

From other issues with 64-bit compatibility, the only one was following macro:

/// Assert that ALWAYS breaks the program when false (in debugger - hits a breakpoint, without debugger - crashes the program).
#define	ASSERT_INT3(x) if ((x) == 0) { __asm { int 3 } }

The problem is that Visual Studio offers inline assembler in 32-bit code only. I needed to use __debugbreak intrinsic function from intrin.h header instead.

Overall porting of this library went quite fast and smoothly, without any major issues. Now I just need to port the rest of my code to 64 bits.

Comments | #visual studio #c++ #commonlib Share

# New Version of PVS-Studio

21:11
Thu
09
Nov 2017

PVS-Studio is a great, commercial static code analyzer. I blogged about it few years ago. Now the latest version is much more powerful. It supports C, C++, and C#, works on Windows and Linux, as a standalone app, as well as plugin for Visual Studio (including latest version 2017).

PVS-Studio is particularly good at finding issues with code portability between 32-bit and 64-bit. Out of my personal projects, I already ported CommonLib to 64 bits, and RegScript2 is written to support 64 bits from the start, but porting my main app (music visualization program) to 64 bits is a large task that I still have on my TODO list. Even if I know how to write portable code (use size_t not int etc. :) I made first commits to this repository 8 years ago, when my programming knowledge was much smaller, so I'm sure there are many nasty bugs there. Making it working as 64-bit app will be a difficult task and I'm sure PVS-Studio will help me with that. I will share my experiences and conclusions when I eventually do it.

In the meantime, I recommend to check their Blog, where developers of this tool share many valuable information. They also maintain list of articles describing errors they found in open source projects.

Comments | #tools #c++ #pvs-studio #visual studio Share

# Immediate Mode GUI - Theory and Example - Slides

21:06
Tue
24
Oct 2017

Today I gave a talk at Warsaw GameDev Meetup. Topic of my presentation was: "Immediate Mode GUI - Theory and Example". You can download slides here:

Comments | #gui #teaching #events Share

# Lost clicks and key presses on low FPS

13:34
Sun
22
Oct 2017

There is a problem with handling input from mouse and keyboard in games and other interactive applications that I just solved. I would like to share my code for the solution. When your app uses a loop that constantly calculates and renders frames, like games usually do, it may seem natural to just read current state of every mouse and keyboard key (whether it's down or up) on each frame. You may then caculate derived information, like whether a button has just been pressed on released, by comparing new state to the state from previous frame. This is how Dear ImGui library works. So first solution could look like this:

void UpdateFrame()
{
    // Fill ImGui::GetIO().DeltaTime, KeyCtrl, KeyShift, KeyAlt etc.
    ImGui::GetIO().MouseDown[0] = (GetKeyState(VK_LBUTTON) & 0x8000) != 0;
    ImGui::GetIO().MouseDown[1] = (GetKeyState(VK_RBUTTON) & 0x8000) != 0;
    ImGui::GetIO().MouseDown[2] = (GetKeyState(VK_MBUTTON) & 0x8000) != 0;
    for(uint32_t i = 0; i < 512; ++i)
        ImGui::GetIO().KeysDown[i] = (GetKeyState(i) & 0x8000) != 0;
    
    ImGui::NewFrame();
    
    if(ImGui::IsKeyPressed('A'))
        // Do something...
}

There is one problem with this approach. If user presses and releases a key for a very short time, so that both press and release happens between two frame, it will go unnoticed. This is very annoying. It happens especially when:

  • Framerate in your program is low.
  • Clicks are very fast because they are generated programatically e.g. automatic double-click generated by X-Mouse Button Control or other mouse software.

First step towards solving this is to react to "real" events that are sent by the operating system:

Read full entry > | Comments | #gui #winapi #windows Share

# VK_KHR_dedicated_allocation unofficial manual

22:36
Mon
16
Oct 2017

I wrote a short article that explains how to use Vulkan extenion "VK_KHR_dedicated_allocation". It may be interesting to you if you are a programmer and you code graphics using Vulkan.

Go to article: VK_KHR_dedicated_allocation unofficial manual

Comments | #graphics #vulkan Share

Older entries >

Twitter

Pinboard Bookmarks

LinkedIn

Blog Tags

STAT NO AD
[Stat] [STAT NO AD] [Download] [Dropbox] [pub] [Mirror]
Copyright © 2004-2017