Parallelizing Algorithms with Intel TBB and C++ Lambdas

Warning! Some information on this page is older than 5 years now. I keep it for reference, but it probably doesn't reflect my current knowledge and beliefs.

# Parallelizing Algorithms with Intel TBB and C++ Lambdas

20:27
Fri
27
Aug 2010

My demo for RiverWash is practically finished. I still could polish it or even make some big changes, because I know it's not great, but that's another story. What I want to write about today is how easily an algorithm can be changed to run in parallel on multicore processors when you use Intel Threading Building Blocks and C++ lambdas.

First, here is an algorithm. In one of my graphic effects I fill a 256x512 texture on CPU every frame. For each pixel I calculate a color based on some input data, which are constant during this operation. So the code looks like this:

void SaveToTexture(const D3DLOCKED_RECT &lockedRect)
{
  uint x, y;
  char *rowPtr = (char*)lockedRect.pBits;
  for (y = 0; y < TEXTURE_SIZEY; ++y)
  {
    XMCOLOR *pixelPtr = (XMCOLOR*)rowPtr;
    for (x = 0; x < TEXTURE_SIZEX; ++x)
    {
      *pixelPtr = CalcColorForPixel(x, y);
      ++pixelPtr;
    }
    rowPtr += lockedRect.Pitch;
  }
}

How to parallelize such loop? First, some theoretical background. Intel TBB is a free C++ library for high-level parallel programming. It has nice interface that makes extensive use of C++ language features but is very clean and simple. It provides many useful classes, from different kinds of mutexes and atomic operations, through thread-safe, scalable containers and memory allocators, till sophisticated task scheduler. But for my problem it was sufficient to use simple parallel_for function that utilizes the task scheduler internally. To start using TBB, I've just had to download and unpack this library, add appropriate paths as Include and Library directories in my Visual C++ and add this code:

#include <tbb/tbb.h>

#ifdef _DEBUG
#pragma comment(lib, "tbb_debug.lib")
#else
#pragma comment(lib, "tbb.lib")
#endif

Second topic I want to cover here are lambdas - new, great language feature from C++0x standard, available since Visual C++ 2010. Lambdas are simply unnamed functions defined inline inside some code. What's so great about them is they can capture the context of the caller. Selected variables can be passed by value or by reference, as well as this pointer or even "everything". It makes them ideal replacement for ugly functors that had to be used in C++ before.

Summing it all together, parallelized version of my algorithm is not much more complicated than the serial version:

void SaveToTexture(const D3DLOCKED_RECT &lockedRect)
{
  tbb::parallel_for(
    tbb::blocked_range<uint>(0, TEXTURE_SIZEY),
    [this, &lockedRect](const tbb::blocked_range<uint> &range)
  {
    uint x, y;
    char *rowPtr = (char*)lockedRect.pBits + lockedRect.Pitch * range.begin();
    for (y = range.begin(); y != range.end(); ++y)
    {
      XMCOLOR *pixelPtr = (XMCOLOR*)rowPtr;
      for (x = 0; x < TEXTURE_SIZEX; ++x)
      {
        *pixelPtr = CalcColorForPixel(x, y);
        ++pixelPtr;
      }
      rowPtr += lockedRect.Pitch;
    }
  } );
}

This simple change made all my 4 CPU cores busy for 90+% and gave almost 4x speedup in terms of frame time, which is good result. So as you can see, coding parallel applications is not necessarily difficult :)

Comments | #libraries #rendering #c++ Share

Comments

STAT NO AD
[Stat] [STAT NO AD] [Download] [Dropbox] [pub] [Mirror]
Copyright © 2004-2017