Tag: c++

Entries for tag "c++", ordered from most recent. Entry count: 139.

Pages: 1 2 3 ... 18 >

# How to design API of a library for Vulkan?

Fri
08
Feb 2019

In my previous blog post yesterday, I shared my thoughts on graphics APIs and libraries. Another problem that brought me to these thoughts is a question: How do you design an API for a library that implements a single algorithm, pass, or graphics effect, using Vulkan or DX12? It may seem trivial at first, like a task that just needs to be designed and implemented, but if you think about it more, it turns out to be a difficult issue. They are few software libraries like this in existence. I don’t mean here a complex library/framework/engine that “horizontally” wraps the entire graphics API and takes it to a higher level, like V-EZ, Nvidia Factor, or Google Filament. I mean just a small, “vertical”, plug-in library doing one thing, e.g. implementing ambient occlusion effect, efficient texture mipmap down-sampling, rendering UI, or simulating particle physics on the GPU. Such library needs to interact efficiently with the rest of the user’s code to be part of a large program or game. Vulkan Memory Allocator is also not a good example of this, because it only manages memory, implements no render passes, involves no shaders, and it interacts with a command buffer only in its part related to memory defragmentation.

I met this problem at my work. Later I also discussed it in details with my colleague. There are multiple questions to consider:

This is a problem similar to what we have with any C++ libraries. There is no consensus about the implementation of various basic facilities, like strings, containers, asserts, mutexes etc., so every major framework or game engine implements its own. Even something so simple as min/max function is defined is multiple places. It is defined once in <algorithm> header, but some developers don’t use STL. <Windows.h> provides its own, but these are defined as macros, so they break any other, unless you #define NOMINMAX before the include… A typical C++ nightmare. Smaller libraries are better just configurable or define their own everything, like the Vulkan Memory Allocator having its own assert, vector (can be switched to standard STL one), and 3 versions of read-write mutex.

All these issues make it easier for developers to just write a paper, describe their algorithm, possibly share a piece of code, pseudo-code or a shader, rather than provide ready to use library. This is a very bad situation. I hope that over time patterns emerge of how the API of a library implementing a single pass or effect using Vulkan/DX12 should look like. Recently my colleague shared an idea with me that if there was some higher-level API that would implement all these interactions between various parts (like resource allocation, image barriers) and we all commonly agreed on using it, then authoring libraries and stitching them together on top of it would be way easier. That’s another argument for the need of such new, higher-level graphics API.

Comments | #c++ #graphics #libraries #directx #vulkan #gpu Share

# Two most obscure bugs in my life

Wed
14
Nov 2018

Fixing bugs is a significant part of software development, and a very emotional one - from frustration when you have no idea what is happening to euphoria when you find the cause and you know how to fix it. There are many different kinds of bugs and reasons why they are introduced to the code. (Conducting a deep investigation of how each bug happened might be an interesting idea - I will write a separate blog post about it someday...) But the most frustrating ones are probably these that occur only in some specific conditions, like only on one of many supported platforms or only in "Release" and not in "Debug" configuration. Below you will find description of the two most obscure bugs I've found and fixed in my life, for your education and amusement :) Both were in large, multiplatform, C++ codebases.

1. There was this bug reported causing incorrect program behavior, occurring on only one of many platforms where the program was built and used. It was probably Mac, but I can't remember exactly. It was classified as a regression - it used to work before and stopped working, caught by automated tests, after some specific code change. The strange thing was that the culprit submit to the code repository introduced only new comments and nothing else. How can change in a code comment introduce a new bug?!

It turned out that the author of this change wanted to draw a nice "ASCII art" in his comment, so he's put something like this:

// HERE STARTS AN IMPORTANT SECTION
//==================================================================\\
Function1();
Function2();
(...)

Have you noticed anything suspicious?...

A backslash '\' at the end of line in C/C++ means that the logical line is continued to next line. This feature is used mostly when writing complex preprocessor macros. But a code starting from two slashes '//' begins a comment that spans until the end of line. Now the question is: Does the single-line comment span to next line as well? In other words: Is the first function call commented out, or not?

I don't know and I don't really care that much about what would "language lawyers" read from the C++ specification and define as a proper behavior. The real situation was that compilers used on some platforms considered the single-line comment to span to the next line, thus commenting out call to Function1(), while others didn't. That caused the bug to occur on some platforms only.

The solution was obviously to change the comment not to contain backslash at the end of line, even at the expense of aesthetics and symmetry of this little piece of art :)

2. I was assigned a "stack overflow" bug occurring only in "Debug" and not in "Release" configuration of a Visual Studio project. At first, I was pretty sure it would be easy to find. After all, "stack overflow" usually means infinite recursion, right? The stack is the piece of memory where local function variables are allocated, along with return addresses from nested function calls. They tend not to be too big, while the stack is 1 MB by default, so there must be unreasonable call depth involved to hit that error.

It turned out not to be true. After few debugging sessions and reading the code involved, I understood that there was no infinite recursion there. It was a traversal of a tree structure, but the depth of its hierarchy was not large enough to be of a concern. It took me a while to realize that the stack was bloated to the extent that exceeded its capacity by one function. It was a very long function - you know, this kind of function that you can see in corporate environment which defies any good practices, but no one feels responsible to refactor. It must have grown over time with more and more code added gradually until it reached many hundreds of lines. It was just one big switch with lots of code in each case.

void Function(Option op)
{
  switch(op)
  {
  case OPTION_FIRST:
    // Lots of code, local variables, and stuff...
    break;
  case OPTION_SECOND:
    // Lots of code, local variables, and stuff...
    break;
  case OPTION_THIRD:
    // Lots of code, local variables, and stuff...
    break;
  ...
  }
}

What really caused the bug was the number and size of local variables used in this function. Each of the cases involved many variables, some big like fixed-size arrays or objects of some classes, defined by value on the stack. It was enough to call this function recursively just few times to exhaust the stack capacity.

Why "stack overflow" occurred in "Debug" configuration only and not in "Release"? Apparently Visual Studio compiler can lazily allocate or alias local variables used in different cases of a switch instruction, while "Debug" configuration has all the optimizations disabled and so it allocates all these variables with every function call.

The solution was just to refactor this long function with a switch - to place the code from each case in a separate, new function.

What are the most obscure bugs you've met in your coding practice? Let me know in the comments below.

Comments | #debugger #visual studio #c++ Share

# Efficient way of using std::vector

Sat
22
Sep 2018

Some people say that C++ STL is slow. I would rather say it's the way we use it. Sure, there are many potential sources of slowness when using STL. For example, std::list or std::map tends to allocate many small objects and dynamic allocation is a time-consuming operation. Making many copies of your objects like std::string is also costly - that's why I created str_view project. But std::vector is just a wrapper over a dynamically allocated array that also remembers its size and can reallocate when you add new elements. Without STL, you would need to implement the same functionality yourself.

When it comes to traversing elements of the vector (e.g. to sum the numerical values contained in it), there are many ways to do it. STL is notoriously known for working very slow in Debug project configuration, but as it turns out, this heavily depends on what method do you choose for using it.

Here is a small experiment that I've just made. In this code, I create a vector of 100,000,000 integers, then sum its elements using 5 different methods, calculating how much time does it take for each of them. Results (averaged over 5 iterations for each method) are as follows. Notice logarithmic scale on horizontal axis.

Here is the full source code of my testing program:

#include <cstdio>
#include <cstdint>
#include <vector>
#include <chrono>
#include <numeric>

typedef std::chrono::high_resolution_clock::time_point time_point;
typedef std::chrono::high_resolution_clock::duration duration;
inline time_point now() { return std::chrono::high_resolution_clock::now(); }
inline double durationToMilliseconds(duration d) { return std::chrono::duration<double, std::milli>(d).count(); }

int main()
{
    printf("Iteration,Method,Sum,Time (ms)\n");
    
    for(uint32_t iter = 0; iter < 5; ++iter)
    {
        std::vector<int> numbers(100000000ull);
        numbers[0] = 1; numbers[1] = 2; numbers.back() = 3;

        {
            time_point timeBeg = now();

            // Method 1: Use STL algorithm std::accumulate.
            int sum = std::accumulate(numbers.begin(), numbers.end(), 0);

            printf("%u,accumulate,%i,%g\n", iter, sum, durationToMilliseconds(now() - timeBeg));
        }

        {
            time_point timeBeg = now();

            // Method 2: Use the new C++11 range-based for loop.
            int sum = 0;
            for(auto value : numbers)
                sum += value;

            printf("%u,Range-based for loop,%i,%g\n", iter, sum, durationToMilliseconds(now() - timeBeg));
        }

        {
            time_point timeBeg = now();

            // Method 3: Use traditional loop, traverse vector using its iterator.
            int sum = 0;
            for(auto it = numbers.begin(); it != numbers.end(); ++it)
                sum += *it;

            printf("%u,Loop with iterator,%i,%g\n", iter, sum, durationToMilliseconds(now() - timeBeg));
        }

        {
            time_point timeBeg = now();

            // Method 4: Use traditional loop, traverse using index.
            int sum = 0;
            for(size_t i = 0; i < numbers.size(); ++i)
                sum += numbers[i];

            printf("%u,Loop with indexing,%i,%g\n", iter, sum, durationToMilliseconds(now() - timeBeg));
        }

        {
            time_point timeBeg = now();

            // Method 5: Get pointer to raw array and its size, then use a loop to traverse it.
            int sum = 0;
            int* dataPtr = numbers.data();
            size_t count = numbers.size();
            for(size_t i = 0; i < count; ++i)
                sum += dataPtr[i];

            printf("%u,Loop with pointer,%i,%g\n", iter, sum, durationToMilliseconds(now() - timeBeg));
        }
    }
}

As you can see, some methods are slower than the others in Debug configurations by more than 3 orders of magnitude! The difference is so big that if you write your program or game like this, it may not be possible to use its Debug version with any reasonably-sized input data. But if you look at disassembly, it should be no surprise. For example, method 4 calls vector methods size() and operator[] in every iteration of the loop. We know that in Debug configuration functions are not inilined and optimized, so these are real function calls:

On the other hand, method 5 that operates on raw pointer to the vector's underlying data is not that much slower in Debug configuration comparing to Release. Disassembly from Debug version:

So my conclusion is: Using std::vector to handle memory management and reallocation and using raw pointer to access its data is the best way to go.

My testing environment was:

CPU: Intel Core i7-6700K 4.00 GHz
RAM: DDR4, Dual-Channel, current memory clock 1066 MHz
OS: Windows 10 Version 1803 (OS Build 17134.285)
Compiler: Microsoft Visual Studio Community 2017 Version 15.4.8
Configuration options: x64 Debug/Release
Windows SDK Version 10.0.16299.0

Comments | #stl #c++ #optimization Share

# Macro with current function name - __func__ vs __FUNCTION__

Tue
11
Sep 2018

Today, while programming in C++, I wanted to write an assert-like macro that would throw an exception when given condition is not satisfied. I wanted to include as much information as possible in the message string. I know that condition expression, which is argument of my macro, can be turned into a string by using # preprocessor operator.

Next, I searched for a way to also obtain name of current function. At first, I found __func__, as described here (C++11) and here (C99). Unfortunately, following code fails to compile:

#define CHECK(cond) if(!(cond)) { \
    throw std::runtime_error("ERROR: Condition " #cond " in function " __func__);

void ProcessData()
{
    CHECK(itemCount > 0); // Compilation error!
    // (...)
}

This is because this identifier is actually an implicit local variable static const char __func__[] = "...".

Then I recalled that Visual Studio defines __FUNCTION__ macro, as custom Microsoft extension. See documentation here. This one works as I expected - it can be concatenated with other strings, because it's a string literal. Following macro definition fixes the problem:

#define CHECK(cond) if(!(cond)) \
    { throw std::runtime_error("ERROR: Condition " #cond " in function " __FUNCTION__); }

When itemCount is 0, exception is thrown and ex.what() returns following string:

ERROR: Condition itemCount > 0 in function ProcessData

Well... For any experienced C++ developer, it should be no surprise that C++ standard committee comes up with solutions that are far from being useful in practice :)

Comments | #c++ Share

# Operations on power of two numbers

Sun
09
Sep 2018

Numbers that are powers of two (i.e. 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 and so on...) are especially important in programming, due to the way computers work - they operate on binary representation. Sometimes there is a need to ensure that certain number is power of two. For example, it might be important for size and alignment of some memory blocks. This property simplifies operations on such quantities - they can be manipulated using bitwise operations instead of arithmetic ones.

In this post I'd like to present efficient algorithms for 3 common operations on power-of-2 numbers, in C++. I do it just to gather them in one place, because they can be easily found in many other places all around the Internet. These operations can be implemented using other algorithms as well. Most obvious implementation would involve a loop over bits, but that would give O(n) time complexity relative to the number of bits in operand type. Following algorithms use clever bit tricks to be more efficient. They have constant or logarithmic time and they don't use any flow control.

1. Check if a number is a power of two. Examples:

IsPow2(0)   == true (!!)
IsPow2(1)   == true
IsPow2(2)   == true
IsPow2(3)   == false
IsPow2(4)   == true
IsPow2(123) == false
IsPow2(128) == true
IsPow2(129) == false

This one I know off the top of my head. The trick here is based on an observation that a number is power of two when its binary representation has exactly one bit set, e.g. 128 = 0b10000000. If you decrement it, all less significant bits become set: 127 = 0b1111111. Bitwise AND checks if these two numbers have no bits set in common.

template <typename T> bool IsPow2(T x)
{
    return (x & (x-1)) == 0;
}

2. Find smallest power of two greater or equal to given number. Examples:

NextPow2(0)   == 0
NextPow2(1)   == 1
NextPow2(2)   == 2
NextPow2(3)   == 4
NextPow2(4)   == 4
NextPow2(123) == 128
NextPow2(128) == 128
NextPow2(129) == 256

This one I had in my library for a long time.

uint32_t NextPow2(uint32_t v)
{
    v--;
    v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8;
    v |= v >> 16;
    v++;
    return v;
}
uint64_t NextPow2(uint64_t v)
{
    v--;
    v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8;
    v |= v >> 16; v |= v >> 32;
    v++;
    return v;
}

3. Find largest power of two less or equal to given number. Examples:

PrevPow2(0) == 0
PrevPow2(1) == 1
PrevPow2(2) == 2
PrevPow2(3) == 2
PrevPow2(4) == 4
PrevPow2(123) == 64
PrevPow2(128) == 128
PrevPow2(129) == 128

I needed this one just recently and it took me a while to find it on Google. Finally, I found it in this post on StackOveflow.

uint32_t PrevPow2(uint32_t v)
{
    v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8;
    v |= v >> 16;
    v = v ^ (v >> 1);
    return v;
}
uint64_t PrevPow2(uint64_t v)
{
    v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8;
    v |= v >> 16; v |= v >> 32;
    v = v ^ (v >> 1);
    return v;
}

Update 2018-09-10: As I've been notified on Twitter, C++20 is also getting such functions as standard header <bit>.

Comments | #math #c++ #algorithms Share

# str_view - null-termination-aware string-view class for C++

Sun
19
Aug 2018

tl;dr I've written a small library, which I called "str_view - null-termination-aware string-view class for C++". You can find code and documentation on GitHub - sawickiap/str_view. Read on to see full story behind it...

Let me disclose my controversial beliefs: I like C++ STL. I think that any programming language needs to provide some built-in strings and containers to be called modern and suitable for developing large programs. But of course I'm aware that careless use of classes like std::list or std::map makes program very slow due to large number of dynamic allocations.

What I value the most is RAII - the concept that memory is automatically freed whenever an object referenced by value is destroyed. That's why I use std::unique_ptr all over the place in my personal code. Whenever I create and own an array, I use std::vector, but when I just pass it to some other code for reading, I pass raw pointer and number of elements - myVec.data() and myVec.size(). Similarly, whenever I own and build a string, I use std::string (or rather std::wstring - I like Unicode), but when I pass it somewhere for reading, I use raw pointer.

There are multiple ways a string can be passed. One is pointer to first character and number of characters. Another one is pointer to first character and pointer to the next after last character - a pair of iterators, also called range. These two can be trivially converted between each other. Out of these, I prefer pointer + length, because I think that number of characters is slightly more often needed than pointer past the end.

But there is another way of passing strings common in C and C++ programs - just one pointer to a string that needs to be null-terminated. I think that null-terminated strings is one of the worst and the most stupid inventions in computer science. Not only it limits set of characters available to be used in string content by excluding '\0', but it also makes calculation of string length O(n) time complexity. It also creates opportunity for security bugs. Still we have to deal with it because that's the format that most libraries expect.

I came up with an idea for a class that would encapsulate a reference to an externally-owned, immutable string, or a piece of thereof. Objects of such class could be used to pass strings to library functions instead of e.g. a pointer to null-terminated string or a pair of iterators. They can be then queried for length(), indexed to access individual characters etc., as well as asked for a null-terminated copy using c_str() method - similar to std::string.

Code like this already exists, e.g. C++17 introduces class std::string_view. But my implementation has a twist that I'm quite happy with, which made me call my class "null-termination-aware". My str_view class not only remembers pointer and length of the referred string, but also the way it was created to avoid unnecessary operations and lazily evaluate those that are requested.

If you consider such class useful in your C++ code, see GitHub - sawickiap/str_view project for code (it's just a single header file), documentation, and extensive set of tests. I share this code for free, on MIT license. Feel free to contact me if you find any bugs or have any suggestions regarding this library.

Comments | #productions #libraries #c++ Share

# Check of CommonLib using PVS-Studio

Mon
13
Nov 2017

Courtesy developers of PVS-Studio, I could use this static code analysis tool to check my home projects. I blogged about the tool recently. It's powerful and especially good at finding issues with 32/64-bit portability. Now I analyzed CommonLib - a library I develop and use for many years, which contains various facilities for C++ programming. That allowed me to find and fix many bugs. Here are the most interesting ones I've met:

Read full entry > | Comments | #commonlib #c++ #pvs-studio Share

# Ported CommonLib to 64 Bits

Sun
12
Nov 2017

I recently updated my CommonLib library to support 64-bit code. This is a change that I planned for a very long time. I written this code so many years ago that 64-bit compilation on Windows was not popular back then. Today we have opposite situation - a modern Windows application or game could be 64-bit only. I decided to make the library working in both configurations. The way I did it was:

  1. Created 64-bit configurations in Visual Studio project.
  2. Went over all the code. Made changes required to work correctly in 64-bit configuration.
  3. Compiled the code - made sure it's compiling successfully.
  4. Fixed everything that compiler reported as warnings.
  5. Made sure tests are passing.
  6. Scanned the code with PVS-Studio static code analysis tool. I covered this point in a separate blog post.

The most important and also most frequent change I needed to make in the code was to use size_t type for every size (like the number of bytes), index (for indexing arrays), count (number of elements in some collection), length (e.g. number of characters in a string) etc. This is the "native" type intended for that purpose and as such it's 32-bit or 64-bit, depending on configuration. This is also the type returned by sizeof operator and functions like strlen. Before making this change, I carelessly mixed size_t with uint32.

I sometimes use maximum available value (all ones) as a special value, e.g. to communicate "not found". Before the change I used UINT_MAX constant for that. Now I use SIZE_MAX.

I know there is a group of developers who advocate for using signed integers for indices and counts, but I'm among the proponents of unsigned types, just like C++ standard recommends. In those very rare situations where I need a signed size, correct solution is to use ptrdiff_t type. This is the case where I apply the concept of "stride", which I blogged about here (Polish) and here.

I have a hierarchy of "Stream" classes that represent a stream of bytes that can be read from or written to. Derived classes offer unified interface for a buffer in memory, a file and many other. Before the change, all the offsets and sizes in my streams were 32-bit. I needed to make a decision how to change them. My first idea was to use size_t. It seems natural choice for MemoryStream, but it doesn't make sense for FileStream, as files can always exceed 4 GB size, regardless of 32/64-bit configuration of my code. Such files were not supported correctly by this class. I eventually decided to always use 64-bit types for stream size and cursor position.

There are cases where my usage of size_t in the library interface can't work down to the implementation, because underlying API supports only 32-bit sizes. That's the case with zlib (a data compression library that I wrapped into stream classes in files ZlibUtils.*), as well as BSTR (string type used by Windows COM that I wrapped into a class in files BstrString.*). In those cases I just put asserts in the code checking whether actual size doesn't exceed 32-bit maximum before downcasting. I know it's not the safe solution - I should report error there (throw an exception), but well... That's only my personal code anyway :)

BstrString::BstrString(const wchar_t *wcs, size_t wcsLen)
{
    (...)
    assert(wcsLen < (size_t)UINT_MAX);
    Bstr = SysAllocStringLen(wcs, (UINT)wcsLen);

From other issues with 64-bit compatibility, the only one was following macro:

/// Assert that ALWAYS breaks the program when false (in debugger - hits a breakpoint, without debugger - crashes the program).
#define	ASSERT_INT3(x) if ((x) == 0) { __asm { int 3 } }

The problem is that Visual Studio offers inline assembler in 32-bit code only. I needed to use __debugbreak intrinsic function from intrin.h header instead.

Overall porting of this library went quite fast and smoothly, without any major issues. Now I just need to port the rest of my code to 64 bits.

Comments | #visual studio #c++ #commonlib Share

Pages: 1 2 3 ... 18 >

STAT NO AD
[Stat] [STAT NO AD] [Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2019