Entries for tag "algorithms", ordered from most recent. Entry count: 65.

Warning! Some information on this page is older than 5 years now. I keep it for reference, but it probably doesn't reflect my current knowledge and beliefs.

# Operations on power of two numbers

Sun

09

Sep 2018

Numbers that are powers of two (i.e. 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 and so on...) are especially important in programming, due to the way computers work - they operate on binary representation. Sometimes there is a need to ensure that certain number is power of two. For example, it might be important for size and alignment of some memory blocks. This property simplifies operations on such quantities - they can be manipulated using bitwise operations instead of arithmetic ones.

In this post I'd like to present efficient algorithms for 3 common operations on power-of-2 numbers, in C++. I do it just to gather them in one place, because they can be easily found in many other places all around the Internet. These operations can be implemented using other algorithms as well. Most obvious implementation would involve a loop over bits, but that would give O(n) time complexity relative to the number of bits in operand type. Following algorithms use clever bit tricks to be more efficient. They have constant or logarithmic time and they don't use any flow control.

**1. Check if a number is a power of two.** Examples:

IsPow2(0) == true (!!) IsPow2(1) == true IsPow2(2) == true IsPow2(3) == false IsPow2(4) == true IsPow2(123) == false IsPow2(128) == true IsPow2(129) == false

This one I know off the top of my head. The trick here is based on an observation that a number is power of two when its binary representation has exactly one bit set, e.g. 128 = 0b10000000. If you decrement it, all less significant bits become set: 127 = 0b1111111. Bitwise AND checks if these two numbers have no bits set in common.

template <typename T> bool IsPow2(T x) { return (x & (x-1)) == 0; }

**2. Find smallest power of two greater or equal to given number.** Examples:

NextPow2(0) == 0 NextPow2(1) == 1 NextPow2(2) == 2 NextPow2(3) == 4 NextPow2(4) == 4 NextPow2(123) == 128 NextPow2(128) == 128 NextPow2(129) == 256

This one I had in my library for a long time.

uint32_t NextPow2(uint32_t v) { v--; v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8; v |= v >> 16; v++; return v; } uint64_t NextPow2(uint64_t v) { v--; v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8; v |= v >> 16; v |= v >> 32; v++; return v; }

**3. Find largest power of two less or equal to given number.** Examples:

PrevPow2(0) == 0 PrevPow2(1) == 1 PrevPow2(2) == 2 PrevPow2(3) == 2 PrevPow2(4) == 4 PrevPow2(123) == 64 PrevPow2(128) == 128 PrevPow2(129) == 128

I needed this one just recently and it took me a while to find it on Google. Finally, I found it in this post on StackOveflow.

uint32_t PrevPow2(uint32_t v) { v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8; v |= v >> 16; v = v ^ (v >> 1); return v; } uint64_t PrevPow2(uint64_t v) { v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8; v |= v >> 16; v |= v >> 32; v = v ^ (v >> 1); return v; }

**Update 2018-09-10:** As I've been notified on Twitter, C++20 is also getting such functions as standard header <bit>.

Comments | #math #c++ #algorithms Share

# How to check if an integer number is a power of 10?

Wed

04

Oct 2017

I was recently asked a test question to write a function that would check whether a given integer number is a power of ten. I didn't know the answer immediately because I never solved this puzzle before, so I needed to think about a solution. I quickly came to a conclusion that there are many possible solutions, so I later decided to write them down and share with you.

Number is a power of 10 if it's equal to 10, 100, 1000 etc. 1 is also 0-th power of 10. Other numbers like 2, 3, 11, 12 etc. are not powers of 10. 0 and all negative numbers should also be rejected.

Below you can find 5 different solutions - all of them correct, but some of them more efficient than others. Example code is in C, but you can easily implement the same algorithms in Java or any other programming language.

First thing that came to my mind was to convert the number to a string and check whether it contains '1' followed by '0'-s.

int IsLog10_v1(int32_t x)

{

// Convert x to string.

char buf[12];

itoa(x, buf, 10);

const size_t bufLen = strlen(buf);

// Check if string contains '1' followed by '0'-s.

if(buf[0] != '1')

return 0;

for(size_t i = 1; i < bufLen; ++i)

{

if(buf[i] != '0')

return 0;

}

return 1;

}

Another option is to convert the number to floating-point format, use "real" log10 function, and check if the result is an integer. Note that double must be used here, because float has too small precision and would incorrectly return 1 for inputs like: 999999999, 1000000001.

int IsLog10_v2(int32_t x)

{

// Convert x to double. Calculate log10 of it.

double x_d = (double)x;

double x_log10 = log10(x_d);

// Check if result is integer number - has zero fractional part.

return x_log10 - floor(x_log10) == 0.0;

}

If we want to operate on integer numbers only, we may come up with a recursive algorithm that checks whether the number is greater than zero, divisible by 10 and then calls the same function for x/10.

int IsLog10_v3(int32_t x)

{

if(x <= 0)

return 0;

if(x == 1)

return 1;

if(x % 10 != 0)

return 0;

// Recursion.

return IsLog10_v3(x / 10);

}

Because it's a tail recursion, it's easy to convert it to iterative algorithm (use loop instead of recursive call).

int IsLog10_v4(int32_t x)

{

if(x <= 0)

return 0;

// Same algorithm as v3, but converted from recursion to loop.

for(;;)

{

if(x == 1)

return 1;

if(x % 10 != 0)

return 0;

x /= 10;

}

}

Finally, the most efficient solution is to notice that there are only few possible powers of 10 in the range of 32-bit integers, so we can just enumerate them all.

int IsLog10_v5(int32_t x)

{

// Just enumerate all possible results.

return

x == 1 ||

x == 10 ||

x == 100 ||

x == 1000 ||

x == 10000 ||

x == 100000 ||

x == 1000000 ||

x == 10000000 ||

x == 100000000 ||

x == 1000000000;

}

You can find full source code as a simple, complete C program together with tests on my GitHub, as file: **IsLog10.c**.

Comments | #c #algorithms Share

# How to Flip Triangles in Triangle Mesh

Sat

19

Jan 2013

Given triangle mesh, as we use it in real-time rendering of 3D graphics, we say that each triangle have two sides, depending on whether its vertices are oriented clockwise or counterclockwise from particular point of view. In Direct3D, by default, triangles oriented clockwise are considered front-facing and they are visible, while triangles oriented counterclockwise are invisible because they are discarded by the API feature called backface culling.

When we have backface culling enabled and we convert mesh between coordinate systems, we sometimes need to "flip triangles". When vertices of each triangle are separate, an algorithm for this is easy. We just need to swap first with third vertex of each triangle (or any other two vertices). So we can start implementing the flipping method like this:

class CMesh

{

...

D3D11_PRIMITIVE_TOPOLOGY m_Topology;

bool m_HasIndices;

std::vector<SVertex> m_Vertices;

std::vector<uint32_t> m_Indices;

};

void CMesh::FlipTriangles()

{

if(m_Topology == D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST)

{

if(m_HasIndices)

FlipTriangleListInArray<uint32_t>(m_Indices);

else

FlipTriangleListInArray<SVertex>(m_Vertices);

}

...

}

Where the function template for flipping triangles in a vector is:

template<typename T>

void CMesh::FlipTriangleListInArray(std::vector<T>& values)

{

for(size_t i = 0, count = values.size(); i < count - 2; i += 3)

std::swap(values[i], values[i + 2]);

}

Simple reversing all elements in the vector with std::reverse would also do the job. But things get complicated when we consider triangle strip topology. (I assume here that you know how graphics API-s generate orientation of triangles in a triangle strip.) Reversing vertices works, but only when number of vertices in the strip is odd. When it's even, triangles stay oriented in the same way.

I asked question about this on forum.warsztat.gd (in Polish). User albiero proposed following solution: just duplicate first vertex. It will generate additional degenerate (invisible) triangle, but thanks to this all following triangles will be flipped. It seems to work!

I also wanted to handle strip-cut index (a special value -1 which starts new triangle strip), so the rest of my fully-featured algorithm for triangle flipping is:

...

else if(m_Topology == D3D11_PRIMITIVE_TOPOLOGY_TRIANGLESTRIP)

{

if(m_HasIndices)

{

size_t begIndex = 0;

while(begIndex < m_Indices.size())

{

const size_t indexCount = m_Indices.size();

while(begIndex < indexCount && m_Indices[begIndex] == UINT_MAX)

++begIndex;

if(begIndex == indexCount)

break;

size_t endIndex = begIndex + 1;

while(endIndex < indexCount && m_Indices[endIndex] != UINT_MAX)

++endIndex;

// m_Indices.size() can change here!

FlipTriangleStripInArray<uint32_t>(m_Indices, begIndex, endIndex);

begIndex = endIndex + 1;

}

}

else

FlipTriangleStripInArray<SVertex>(m_Vertices, 0, m_Vertices.size());

}

}

Where function template for flipping triangles in selected part of a vector is:

template<typename T>

void CMesh::FlipTriangleStripInArray(std::vector<T>& values, size_t begIndex, size_t endIndex)

{

const size_t count = endIndex - begIndex;

if(count < 3) return;

// Number of elements (and triangles) is odd: Reverse elements.

if(count % 2)

std::reverse(values.begin() + begIndex, values.begin() + endIndex);

// Number of elements (and triangles) is even: Repeat first element.

else

values.insert(values.begin() + begIndex, values[begIndex]);

}

Comments | #directx #rendering #algorithms Share

# Iterating Backward using Unsigned Numbers

Thu

01

Mar 2012

I've had a discussion recently with my colleagues about what is the best way of writing a loop in C++ that would iterate backward: Count-1, Count-2, ..., 0, using unsigned integer numbers. I'd like to share all possible solutions I managed to gather.

Iterating forward is simple:

unsigned count = 10;

// Forward: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

for(unsigned i = 0; i < count; ++i)

printf("%d ", i);

Iterating backward is equally simple and looks similar, if you use signed integers:

// Backward: 9, 8, 7, 6, 5, 4, 3, 2, 1, 0

// Using signed integer

for(int i = count - 1; i >= 0; --i)

printf("%d ", i);

Same code cannot be used with unsigned integers, because it creates infinite loop, as unsigned number is always >= 0 and just wraps around to maximum value 0xFFFFFFFF:

// WRONG: Inifinte loop, unsigned int wraps around and is always >= 0.

for(unsigned i = count - 1; i >= 0; --i)

printf("%d ", i);

Changing condition to i > 0 is also invalid, because then loop is not executed for i == 0:

// WRONG: Not executed for i == 0.

for(unsigned i = count - 1; i > 0; --i)

printf("%d ", i);

Here comes a bunch of possible valid solutions. My favourite is following for loop:

for(unsigned i = count; i--; )

printf("%d ", i);

Following while loop does same thing. Some people say that it looks more clear. I don't think so, while I believe the for loop is better because it creates local scope for variable i, while the following code does not:

unsigned i = count;

while(i--)

printf("%d ", i);

Another way of implementing exactly same logic is to write the condition like this. It may look like C++ had the "goes to zero" arrow operator -->, but it is actually just (i-- > 0) :)

unsigned i = count;

while(i --> 0)

printf("%d ", i);

Another possible solution of loop condition (a one that does not use the postdecrementation operator) is to compare i with maximum integer value. One disadvantage of it is that you have to carefully match the value of the max constant with type of your iterator - whether it is unsigned int, unsigned short etc.

for(unsigned i = count - 1; i != UINT_MAX; --i)

printf("%d ", i);

Another way of doing this, without using maximum value, is:

for(unsigned i = count - 1; i < count; --i)

printf("%d ", i);

Finally there are solutions possible that introduce second iterator:

// Option 1 using second iterator:

// i goes 0...9, j is inverted i.

for(unsigned i = 0; i < count; ++i)

{

unsigned j = count - i - 1;

printf("%d ", j);

}

// Option 2 using second iterator:

// i goes 10...1, j is decramented i.

for(unsigned i = count; i > 0; --i)

{

unsigned j = i - 1;

printf("%d ", j);

}

Which is your favorite loop? Please leave a comment :)

Comments | #algorithms #c++ Share

# Algorithms for Managing Tree Structure

Sat

04

Feb 2012

What are the ways of storing a tree data structure in memory? I mean a tree with arbitrary number of child nodes. Probably first thing that comes to our minds - as high level programmers - is to dynamically allocate each node as object of some class that would store its data and a collection of pointers to its children. Opposite approach would be to store all nodes in some array - a continuous piece of memory, where each node would refer to other nodes by index or something. Third approach - something in between in terms of sophistication, as well as efficiency - is to have dynamically allocated nodes, but not to store a collection of all child nodes. It is a good idea to employ kind of linked list here. I'd like to describe such data structure and basic algorithms for manipulating it.

The idea is that each node stores - besides its data - 5 pointers (possibly null) to some other nodes: parent, previous sibling, next sibling, first child and last child. It could be seen as equivalent to doubly linked list. It has much more pointers that is needed to be able to traverse whole tree, but thanks to this you can traverse such tree in any order, as well as insert and remove nodes at any point, with best possible time complexity. A node with Parent == null is considered a tree root. Here is a picture of a single node and example tree:

Read full entry | Comments | #algorithms #.net Share

# Resizing Images to Generate Thumbnails in PHP

Fri

06

Jan 2012

Some time ago I came across a problem of calculating an image size for automatically generated thumbnail for gallery script coded in PHP. It required a moment's thought, so I'd like to share this algorithm. In fact there will be two algorithms.

First code scales the image down with following rules: Destination size will not exceed given thumbnail size, but the width or height can be smaller to always preserve aspect ratio. Images smaller than thumbnail size are not magnified.

Inputs:

$src_size_x, $src_size_y - Source image size.

THUMBNAIL_SIZE_X, THUMBNAIL_SIZE_Y - Constants defining thumbnail size.

Outputs:

$dst_size_x, $dst_size_y - Destination thumbnail size.

if( $src_size_x <= THUMBNAIL_SIZE_X && $src_size_y <= THUMBNAIL_SIZE_Y ) { $dst_size_x = $src_size_x; $dst_size_y = $src_size_y; } else { $dst_size_x = THUMBNAIL_SIZE_X; $dst_size_y = (int)( $src_size_y * THUMBNAIL_SIZE_X / $src_size_x ); if( $dst_size_y > THUMBNAIL_SIZE_Y ) { $dst_size_x = (int)( $src_size_x * THUMBNAIL_SIZE_Y / $src_size_y ); $dst_size_y = THUMBNAIL_SIZE_Y; } }

Second algorithm also helps with generating image thumbnails, but works differently. It assumes that destination image will always be of size (THUMBNAIL_SIZE_X, THUMBNAIL_SIZE_Y), while source iamge can be cropped to select only the center of the image if it has different aspect ratio than the thumbnail.

Inputs to this algorithm are the same, while outputs are:

$src_x, $src_y - Offset in the source image to begin copying from.

$src_w, $src_h - Size of the rectangle to select from cropped source image.

The code that uses this algorithm should then select from the source image a rectangle with left-top position ($src_x, $src_y), size ($src_w, $src_h) and copy it, with scaling and resampling, to the destination image with the size exactly (THUMBNAIL_SIZE_X, THUMBNAIL_SIZE_Y). That's what imagecopyresampled function from GD library can do. This algorithm does not handle cases where source image is smaller than thumbnail.

$src_h = $src_size_y; $src_w = (int)( $src_h * THUMBNAIL_SIZE_X / THUMBNAIL_SIZE_Y ); if( $src_w <= $src_size_x ) { $src_x = ( $src_size_x - $src_w ) / 2; $src_y = 0; } else { $src_w = $src_size_x; $src_h = (int)( $src_w * THUMBNAIL_SIZE_Y / THUMBNAIL_SIZE_X ); $src_x = 0; $src_y = ( $src_size_y - $src_h ) / 2; }

Comments | #webdev #algorithms #php Share

# Fast, Heuristic Disk Search

Wed

04

Jan 2012

In March 2007 I've written an algorithm which I called "Fast, heuristic disk search" and published it in my article Szybkie, heurystyczne przeszukiwanie dysku (in Polish) on codeguru.pl. Today I went back to it, improved it a bit and I think it is a good opportunity to present it once again, this time in English :)

The problem is about searching hard disk, looking for a file or directory with particular name. Naive approach, which is used by file searching functions in programs like Total Commander, uses depth-first search, so it enters C:\, then C:\Program Files, then C:\Program Files\Adobe etc., spending lots of time in places where it probably won't find the file we are looking for.

Game developers know that searching algorithm can be significantly improved by using some heuristics that will drive algorithm towards (probably) right direction. We do it in A* pathfinding. Does a similar principle could be used also when searching for a file?

My particular problem was configuring a program with path to a console tool, let's say fxc.exe shader compiler from DirectX SDK. We could present a textbox where user has to enter path to this tool. We could also make "Browse" button with an open file dialog to simplify locating it. But wouldn't be great if program tried to locate it automatically on first run? Some installed programs store path in a registry key, but if it doesn't, we have no idea where could it be if we don't look for it among files and directories on hard disk.

My algorithm tries to locate given file or directory in given time limit by searching all given directories and their subdirectories. The improvement I propose here is inteligent management of the order in which we process these directories.

- I start from root directories like C:\, D:\ as well as directories that are particularly interesting in our case, like Program Files.
- I perform breadth-first search instead of depth-first search. This way I don't waste time in some deeply nested, uninteresting subdirectories.
- I first process directories that look interesting because they contain some strings in their names that indicate we move in the right direction.

The algorithm uses two collections: a queue of directories to process and a set of already visisted directories. Here is the code in C#:

Read full entry | Comments | #algorithms #.net Share

# Circular Buffer of Raw Binary Data in C++

Fri

21

Oct 2011

**Circular Buffer**, Cyclic Buffer or Ring Buffer is a data structure that effectively manages a queue of some items. Items can be added at the back and removed from the front. It has limited capacity because it is based on preallocated array. Functionality is implemented using two pointers or indices - pointing to the first and past the last valid element. The Begin pointer is incremented whenever an item is popped from the front so that it "chases" the End pointer, which is incremented whenever a new item is pushed to the back. They can both wrap around the size of the array. Both operations are done very effectively - in constant time O(1) and no reallocations are needed. This makes circular buffers perfect solution for queues of some data streams, like video or audio.

It's not very sophisticated data structure, but there is one problem. Sample codes of circular buffers you can find on the Internet, just like for many other data structures, operate usually on a single object of some user-defined type. What if we need a buffer for raw binary data, stored as array of bytes? We can treat single bytes as data items, but enqueueing and dequeueing single bytes with separate function calls would not be efficient. We can, on the other hand, define some block of data (like 4096 bytes) as the type of item, but this limits us to operating on on such block at a time.

Best solution would be to write an implementation that operates on binary data in form of (const char *bytes, size_t byte_count) and allows writing and reading arbitrary amount of data in a single call, just like functions for writing and reading files do. The only problem that arises in such code is that sometimes the block of data you want to write to or read from the buffer is not in a continuous region of memory, but wraps around to the beginning of the array so we have to process it on two parts - first at the end of the array and the second at the beginning.

Here is my C++ implementation of a circular buffer for raw binary data:

#include <algorithm> // for std::min class CircularBuffer { public: CircularBuffer(size_t capacity); ~CircularBuffer(); size_t size() const { return size_; } size_t capacity() const { return capacity_; } // Return number of bytes written. size_t write(const char *data, size_t bytes); // Return number of bytes read. size_t read(char *data, size_t bytes); private: size_t beg_index_, end_index_, size_, capacity_; char *data_; }; CircularBuffer::CircularBuffer(size_t capacity) : beg_index_(0) , end_index_(0) , size_(0) , capacity_(capacity) { data_ = new char[capacity]; } CircularBuffer::~CircularBuffer() { delete [] data_; } size_t CircularBuffer::write(const char *data, size_t bytes) { if (bytes == 0) return 0; size_t capacity = capacity_; size_t bytes_to_write = std::min(bytes, capacity - size_); // Write in a single step if (bytes_to_write <= capacity - end_index_) { memcpy(data_ + end_index_, data, bytes_to_write); end_index_ += bytes_to_write; if (end_index_ == capacity) end_index_ = 0; } // Write in two steps else { size_t size_1 = capacity - end_index_; memcpy(data_ + end_index_, data, size_1); size_t size_2 = bytes_to_write - size_1; memcpy(data_, data + size_1, size_2); end_index_ = size_2; } size_ += bytes_to_write; return bytes_to_write; } size_t CircularBuffer::read(char *data, size_t bytes) { if (bytes == 0) return 0; size_t capacity = capacity_; size_t bytes_to_read = std::min(bytes, size_); // Read in a single step if (bytes_to_read <= capacity - beg_index_) { memcpy(data, data_ + beg_index_, bytes_to_read); beg_index_ += bytes_to_read; if (beg_index_ == capacity) beg_index_ = 0; } // Read in two steps else { size_t size_1 = capacity - beg_index_; memcpy(data, data_ + beg_index_, size_1); size_t size_2 = bytes_to_read - size_1; memcpy(data + size_1, data_, size_2); beg_index_ = size_2; } size_ -= bytes_to_read; return bytes_to_read; }

Similar phenomenon can be observed in API of the FMOD sound library. Just like graphical textures in DirectX, sound samples in FMOD can also be "locked" to get pointer to a raw memory we can read or fill. But DirectX textures lie in the continuous memory region, so we get a single pointer. The only difficult thing in understanding locking textures is the concept of "stride", which can be greater than the width of a single row. Here in FMOD the Sound::lock() method returns two pointers and two lengths, probably because the locked region can wrap over end of internally used circular buffer like the one shown above.