Tag: algorithms

Entries for tag "algorithms", ordered from most recent. Entry count: 62.

Warning! Some information on this page is older than 3 years now. I keep it for reference, but it probably doesn't reflect my current knowledge and beliefs.

Pages: > 1 2 3 4 ... 8 >

22:12
Tue
29
Jun 2010

Timespan from/to String Conversion

Today I'd like to show a piece of code I've written today. These are functions for converting between a timespan, stored in a int64 as number of milliseconds and string in for of "HH:MM:SS.MMM" (hours, minutes, seconds, milliseconds). That's IMHO a convenient way of presenting timespan to the user. I'll include these functions in the next version of my CommonLib library, to work with raw millisecond number as well as TIMESPAN and/or GameTime types.

The code is in C++, uses STL string and my own StrToUint and UintToStr functions. All suggestions about how these algorithms could be optimized are welcome.

Read full entry > | Comments (3) | Tags: algorithms c++ | Author: Adam Sawicki | Share

19:37
Sat
08
May 2010

LZMA SDK - How to Use

What do you think about when I tell a word "compression"? If you currently study computer science, you probably think about some details of algorithms like RLE, Huffman coding or Burrows-Wheeler transform. If not, then you surely associate compression with archive file formats such as ZIP and RAR. But there is something in between - a kind of libraries that let you compress some data - implement a compression algorithm but do not provide ready file format to pack multiple files and directories into one archive. Such library is useful in gamedev for creating VFS (Virtual File System). Probably the most popular one is zlib - a free C library that implements Deflate algorithm. I've recently discovered another one - LZMA. Its SDK is also free (public domain) and the basic version is a small C library (C++, C# and Java API-s are also available, as well as some additional tools). The library uses LZMA algorithm (Lempel–Ziv–Markov chain algorithm, same as in 7z archive format), which has better compression ratio than Deflate AFAIK. So I've decided to start using it. Here is what I've learned:

If you decide to use only the C API, it's enough to add some C and H files to your project - the ones from LZMASDK\C directory (without subdirectories). Alternatively you can compile them as a static library.

There is a little bit of theory behind the LZMA SDK API. First, the term props means a 5-byte header where the library stores some settings. It must be saved with compressed data to be given to the library before decompression.

Next, the dictionary size. It is the size of a temporary memory block used during compression and decompression. Dictionary size can be set during compression and is then saved inside props. Library uses dictionary of same size during decompression. Default dictionary size is 16 MB so IMHO it's worth changing, especially as I haven't noticed any meaninful drop in compression rate when set it to 64 KB.

And finally, end mark can be saved at the end of compressed data. You can use it so decompression code can determine the end of data. Alternatively you can decide not to use the end mark, but you must remember the exact size of uncompressed data somewhere. I prefer the second method, because remembering data size takes only 4 bytes (for the 4 GB limit) and can be useful anyway, while compressed data finished with end mark are about 6 bytes longer than without it.

Compressing full block of data with single call is simple. You can find appropriate functions in LzmaLib.h header. Here is how you can compress a vector of bytes using LzmaCompress function:

Read full entry > | Comments (9) | Tags: commonlib libraries algorithms | Author: Adam Sawicki | Share

16:47
Sun
21
Mar 2010

My Algorithms for Sorted Arrays

Yesterday I wrote about The Beauty of Sorted Arrays. I believe the C++ STL is not perfect and so I've written some additional code that I needed for the purpose of handling such data structure. It can be found in my CommonLib library.

I once wanted to have a sorting algorithm, just like std::sort, but using insertion sort. The std::sort is of course great, has average O(n log n) complexity and AFAIK internally uses introsort algorithm (an improved version of quicksort). But if we know that the collection is already almost sorted, insertion sort will work faster. So here is my code:

// Basic version that uses operator <
template<class Iterator>
void InsertionSort(Iterator b, Iterator e)
{
  if (b == e) return;
  for (Iterator j = b + 1; j < e; j++)
  {
    typename std::iterator_traits<Iterator>::value_type key;
    key = *j;
    Iterator i = j;
    while (i > b && key < *(i-1))
    {
      *i = *(i-1);
      i--;
    } 
    *i = key;
  }
}

// Overloaded version with explicit comparator
template<class Iterator, typename Compare>
void InsertionSort(Iterator b, Iterator e, Compare Comp)
{
  if (b == e) return;
  for (Iterator j = b + 1; j < e; j++)
  {
    typename std::iterator_traits<Iterator>::value_type key;
    key = *j;
    Iterator i = j;
    while (i > b && Comp(key, *(i-1)))
    {
      *i = *(i-1);
      i--;
    } 
    *i = key;
  }
}

Another problem is finding an item in the ordered array by some key. I can see two solutions in C++, both unsatisfactory. First one is to write your own comparator or overload operator < for your item structure to compare items by key, like this:

struct Item
{
  int Id;
  DWORD Color;
  std::string Name;

  bool operator < (const Item &rhs) const { return Id < rhs.Id; }
};

The problem is that C++ comparator always compares two full items, so to use this comparing feature for finding item in the array, we must construct and pass full Item object, despite we use only the Id field.

Item item;
item.Id = 1;
bool contains = std::binary_search(vec.begin(), vec.end(), item);

Second solution is to use std::map container. It has separate key and value type so you can search by key:

std::map<int, Item>::iterator foundIt = map.find(1);
if (foundIt != map.end())
  std::cout << "Found ID=" << foundIt->first
    << ", Name=" << foundIt->second.Name << std::endl;

But the problem is that the key field is separate from value type so the Id field should not be included in the Item structure because it's redundant. We should instead use std::pair<int, Item> as the type of items contained in the map, which is alised to std::map<int, Item>::value_type.

I wanted to have more convenient way to quickly search an ordered array by key, so I've coded these two function templates:

template <typename IterT, typename KeyT, typename CmpT>
IterT FirstNotLessIndex(IterT beg, IterT end, const KeyT &key, CmpT cmp)
{
  unsigned down = 0, up = (end - beg);
  while(down < up)
  {
    unsigned mid = (down + up) / 2;
    int res = cmp(key, *(beg + mid));
    if(res > 0)
      down = mid + 1;
    else
      up = mid;
  }
  return beg + down;
}

template <typename IterT, typename KeyT, typename CmpT>
IterT BinarySearch(IterT beg, IterT end, const KeyT &key, CmpT cmp)
{
  IterT it = FirstNotLessIndex<IterT, KeyT, CmpT>(beg, end, key, cmp);
  if (it == end || cmp(key, *it) != 0)
    return end;
  return it;
}

First one performs binary search to determine the iterator for a first element not less (greater or equal) than the specified key. It's perfect for determining the place for inserting new item into the array. Second function uses first one to check whether an item with exact matching key exists in the array and returns iterator pointing to it, or end iterator if not found.

As you can see, both functions take the type of the key as a template parameter, which may be different than full item type pointed by iterator. They also take a comparator functor which is always called with parameters (key, item) and should return int. The comparator should return a number less, equal or greater than zero according to the result of the comparison - just like strcmp, memcmp and other functions from old C.

Now my example can be written like this:

struct Item
{
  int Id;
  DWORD Color;
  std::string Name;

  static int Cmp(int id, const Item &rhs) { return id - rhs.Id; }
};

bool contains =
  common::BinarySearch(vec.begin(), vec.end(), 1, &Item::Cmp) != vec.end();

Having such universal comparison function, it's easy to convert it to all comparison operators:

bool operator <  (const Item &rhs) const { return Cmp(Id, rhs) <  0; }
bool operator >  (const Item &rhs) const { return Cmp(Id, rhs) >  0; }
bool operator <= (const Item &rhs) const { return Cmp(Id, rhs) <= 0; }
bool operator >= (const Item &rhs) const { return Cmp(Id, rhs) <= 0; }
bool operator == (const Item &rhs) const { return Cmp(Id, rhs) == 0; }
bool operator != (const Item &rhs) const { return Cmp(Id, rhs) != 0; }

Opposite conversion is also possible. If you have a type with operator < accessible (like float or your own type with overloaded operator), you can convert it to a comparator returning int with this function:

template <typename T>
inline int UniversalCmp(const T &a, const T &b)
{
  if (a < b) return -1;
  if (b < a) return  1;
  return 0;
}

Comments (0) | Tags: commonlib stl c++ algorithms | Author: Adam Sawicki | Share

16:48
Sat
20
Mar 2010

The Beauty of Sorted Arrays

A one-dimmensional array (also called vector) is very simple data structure. We were taught that inserting or deleting elements in the middle of the array is slow because we have to move (by rewriting) all following elements. That's why books about C++ teach that for sorted collections - those who provide fast search operation - it's better to use containers like std::set or std::map.

But those who code games in C++ and care about performance know that today's processors (CPU) are very fast in terms of doing calculations and processing lots of data, while access to the RAM memory becomes more and more kind of a bottleneck. According to great presentation Pitfalls of Object Oriented Programming by Tony Albrecht from Sony, a cachce miss (access to some data from the memory which are not currently in CPU cache) costs about 400 cycles! Dynamic memory allocation (like operator new in C++) is also a slow operation. That's why data structures that use separate nodes linked with pointers (like std::list, std::set, std::map) are slower than std::vector and it's often better to keep a sorted collection of elements in a vector, even if you have hundreds or thousands of elements and must sometimes insert or delete them from the middle.

Unfortunately, C++ STL doesn't provide a "std::ordered_vector" container that whould keep items sorted while allocating single chunk of continuous memory. We have to do it by ourselves, like this:

First, we have to choose a way to compare elements. There are multiple solutions (read about comparators in C++), but the simplest one is to just overload operator < in your element structure.

#include <vector>
#include <string>
#include <algorithm>

struct Item
{
  int Id;
  DWORD Color;
  std::string Name;

  Item() { }
  Item(int id, DWORD color, const std::string name)
    : Id(id), Color(color), Name(name) { }

  bool operator < (const Item &rhs) const { return Id < rhs.Id; }
};

typedef std::vector<Item> ItemVector;

ItemVector vec;
Item item;

Here is how we insert an item into sorted vector:

item = Item(1, 0xFF0000, "Red");
ItemVector::iterator itToInsert = std::lower_bound(
  vec.begin(), vec.end(), item);
vec.insert(itToInsert, item);

Unfortunately we have to deal with iterators instead of simple indices.

To check if the vector contains element with particular Id:

item = Item(1, 0, std::string());
bool contains = std::binary_search(vec.begin(), vec.end(), item);

Here come another nuisances. We have to create full Item object just to use its Id field for comparison purpose. I'll show tomorrow how to overcome this problem.

It's also annoying that binary_search algorithm returns bool instead of iterator that would show us where the found item is. To find an item and determine its iterator (and from this its index) we have to use lower_bound algorithm. This algorithm is very universal but also hard to understand. It quickly searches a sorted collection and returns the position of the first element that has a value greater than or equivalent to a specified value. It was perfect for determining a place to insert new item into, but to find existing element, we have to use it with this if:

item = Item(1, 0, std::string());
ItemVector::iterator findIt = std::lower_bound(vec.begin(), vec.end(), item);
if (findIt != vec.end() && findIt->Id == item.Id)
{
  size_t foundIndex = std::distance(vec.begin(), findIt);
}
else
{
  // Not found.
}

And finally, to remove an item from the vector while having index, we do this trick:

size_t indexToDelete = 0;
vec.erase(vec.begin() + indexToDelete);

Well, C++ STL is very general, universal, powerful and... very complicated, while standard libraries of other programming languages also have some curiosities in their array classes.

For example, ActionScript 3 have single method to insert and remove items:

function splice(startIndex:int, deleteCount:uint, ...  values):Array

To delete some items, you pass non-zero deleteCount. To insert some items, you pass them as values parameter.

In .NET, the System.Collections.Generic.List<T> class (which is actually an array, the name is misnomer) has very smart BinarySearch method. It returns the "zero-based index of item (...), if item is found; otherwise, a negative number that is the bitwise complement of the index of the next element that is larger than item or, if there is no larger element, the bitwise complement of Count.". It can be used to both find existing item in the array or determine an index for a new item, like:

int index = myList->BinarySearch("abc");
if (index < 0)
  myList->Insert(~index, "abc");

See also: My Algorithms for Sorted Arrays

Comments (7) | Tags: algorithms c++ stl .net flash | Author: Adam Sawicki | Share

19:11
Sun
24
Jan 2010

Doing Something Every X Time

Games are one of these applications where we have a loop spinning infinitely as fast as it can (or at least 30 times per second) and calling some DoFrame function that make all the computations and rendering of a game frame. Dealing with time is an important task in game programming. One of the things game frameworks usually do are retrieving precise current time from the system and measuring FPS. Xion wrote an interesting article about all this stuff some time ago, entitled PÍtla czasu rzeczywistego [pl].

By the way, a fact not so well known by beginners is that FPS (frames per second) is not good measurement of a game performance, because it is expressed in Hz which is reciprocal of average frame time and thus it is nonlinear. The real metric for how much more optimization do you have to put into your game should be average frame time in milliseconds. But that's another topic. I've blogged about this some time ago here [pl]. Others also did this, like Humus or Real-Time Rendering Blog.

Subject I want to write about today is inspired by recent question from forum.gamedev.pl - Koniec z timerami [pl]. It's about implementing timers in games. The question is how to execute some code - let's say to call DoSomething() function - every EventTime time, e.g. every 200 ms. Two solutions have been proposed there by Kos and cybek. The difference between them is suble but important.

Let's assume we have have access to variables:

Time may be expressed in seconds, milliseconds or any units - it doesn't matter here. First solutions looks like this:

// Defining helper variable
float EventCounter;
// Initialization
EventCounter = 0.f;

// Inside DoFrame, called every frame
EventCounter += DeltaTime;
while (EventCounter >= EventTime)
{
  DoSomething();
  EventCounter -= EventTime; // !
}

And here is the second example:

// Inside DoFrame, called every frame
EventCounter += DeltaTime;
if (EventCounter >= EventTime)
{
  DoSomething();
  EventCounter = 0.f; // !
}

So what's the difference? Method number 1 is precise. DoSometing will be always called every EventTime, plus/minus one frame and exactly every EventTime on average. On the other hand, method 2 calls DoSomething() not more often than every EventTime, but usually more rarely. I suppose it's EventTime + average_frame_time/2 on average. I hope you can see from this code why is that.

Why does it make a difference? First case is when you have a stall for, let's say, 500 ms between frames because you did some very expensive computations like failed A* pathfinding or some background process like antivirus slowed down the system. In the next executed frame, the first code above will "catch up" executing DoSomething two or three times, but the second code will just execute DoSomething() once and continue as nothing happened. Second difference is when you measure how many times was DoSomething() executed during some longer period. First code will probably do it exactly period/EventTime +/- 1 and the second code will result in no more, but probably less number of calls.

There is another question: How to express EventCounter variable? This problem is orthogonal to the one described above and make no difference to the code logic, so it's just your choice. Here are the options: As time passes since the last DoSomething call...

As a side note, you don't have to tell me that float is not a good type for expressing absolute and precise time inside games. I know that and I will write more about it another time. The code above was just and example.

Comments (2) | Tags: algorithms | Author: Adam Sawicki | Share

22:00
Tue
12
Jan 2010

Processing File Paths in C++

Paths to files and directories are special kind of strings. Sometimes they have to be processed, e.g. merged or splitted into some parts like drive letter, directory, file name and extension. Standard C library provides two function for this: makepath and splitpath. They use char* strings and operate always on all possible parts (drive, dir, fname, ext). In my opinion they are not very convenient. On the other hand, modern programming languages have more functionality in this field. For example, .NET has System.IO.Path static class with methods like GetExtension, GetFileName, GetDirectoryName.

To make operations on file paths convenient in C++, I've developed some time ago my own functions for that which operate on std::string strings. I've included them in my CommonLib library. They are similar a bit to these from Delphi :) Here is how I approach this topic: First, a common task is to extract part of a path (e.g. "C:\Windows\notepad.exe"). My functions for this are:

I also have some functions for processing paths:

Path can be absolute or relative. In Windows, absolute paths start with drive letter like "C:\" (which we can recognized by looking for colon) or network share (paths that begin with "\\"). Relative paths (where most basic example is a single file name) make sense only in some context. When we use relative paths to open real files, we address them relative to the "Current Directory" (which is a separate topic, I will write about it another time). To deal with absolute and relative paths, I've coded functions such as:

NormalizePath function preprocessess all the . and .. drectories in the path. A single dot used as directory name means "my own directory" (just does nothing) and two dots mean going up one level to the parent directory. So the path we consider here is logically equal to "C:\Windows\.\system32\..\notepad.exe". Paths like this still work, but they can look a bit unclear, so you can use NormalizePath function to clean them.

Paths in Linux are quite different. My functions also support them, although I didn't test this case in real situations yet. First of all, there are no drive letters in Linux, so absolute paths just start with '/', like "/etc/passwd". Second, a slash sign is used as a separator instead of backslash (it's interesting that slash also works in Windows). It's also more common in Linux than in Windows to meet files that don't have extension or have double extension (like "MyFile.tar.gz"). A question arises here whether what we call "extension" starts from the first or from the second dot :)

Comments (1) | Tags: algorithms c++ commonlib | Author: Adam Sawicki | Share

19:56
Sun
27
Dec 2009

Writing Parsers is Easy

Using binary file formats is easier as data can be read and written almost directly as they are in memory, but sometimes we have to use some text formats. Whenever it is neither a standard format like XML or YAML, nor it is so simple we can load it with std::cin or scanf, we have to write a parser to read it. Writing a parser - code that reads and interprets some text format (e.g. file format or network protocol) looks like very complex task, but here I want to convince you that's not true - it is actually quite easy.

Why do I think so? It may look like it is a very long way from reading characters from a file till interpreting them as data records containing numbers, strings and other meaningful information. My answer for that is to split this problem into three layers.

The lowest layer is called CharReader. All it can do is reading single characters. It provides a layer of abstraction over reading data from a real source so you can forget whether you read from a file (hopefully with some buffering - never do file reads byte-after-byte!) or just iterate over a string fully loaded into memory. You can also forger about encoding so you see you data as characters, not bytes. It's a good idea to keep last read character somewhere so it can be "peeked" (read without losing it) many times before going to next one. The interface of a simple CharReader may look like this:

class CharReader {
public:
  CharReader(const std::string &filePath);
  // Returns last read character as out.
  // If cursor at end, returns false.
  bool Peek(char &out);
  // Moves cursor one character forward.
  void Next();
};

Next layer is called Tokenizer. We need it because it's still complicated to interpret characters directly as some structured information. Data in almost every text format is made of multiple-character pieces of information like numbers, strings etc. so it's convenient to provide another layer of abstraction to see data as a sequence of tokens, not single characters. Here are some typical kinds of tokens that can be found also in programming languages:

Tokenizer uses CharReader to read chars and interprets these chars as tokens. Just like CharReader, it buffers and provides access to single token that was parsed recently so you can read it many times before anvancing to next one. It's also Tokenizer's responsibility to skip comments and whitespaces between tokens. Example interface may look like this:

class Tokenizer {
public:
  Tokenizer(CharReader &cr);
  
  enum TOKEN_TYPE {
    TOKEN_END,
    TOKEN_IDENTIFIER,
    TOKEN_STRING,
    TOKEN_SYMBOL,
  };
  // Returns type of the last read token.
  TOKEN_TYPE GetType();
  // Returns content of the last read token.
  const std::string & GetContent();
  // Moves cursor one token forward.
  void Next();
};

And finally, when being able to see file as made of tokens, it's easy to code the third layer - Parser - on top of that. Parser reads subsequent tokens and interprets them as some meaningful data. For example, you can parse a Key="Value" pair by expecting token of type identifier and saving its content as Key, then expecting the (=) symbol and finally expecting a string token and saving its content as Value. Of course you also have to find a way to report errors on all these stages, including information about current row and column.

Comments (1) | Tags: algorithms | Author: Adam Sawicki | Share

20:54
Fri
11
Dec 2009

Random Choice with Given Probabilities

When we want to randomly choose one option from some possible items, we usually do something as simple as (rand() % numberOfItems). But what if we want to vary probabilities of these items? Here is an algorithm for this:

As input data, we have number of items and an array of their probabilities. Each probability is a nonnegative real number. I like to allow these numbers to be as big as I want, knowing that an item with probability 2 (or 200) is just two times more probable than the item with probability 1 (or 100). I will normalize them later.

// == Preprocessing step ==

// Input:
std::vector<float> ItemProbabilities;
// Output:
std::vector<float> ProcessedProbabilities;

// Algorithm:

size_t count = ItemProbabilities.size();
assert(count > 0);

// First pass: summation
float sum = 0.f;
for (size_t i = count; i--; ) // count-1, count-2, ..., 0
{
  float p = ItemProbabilities[i];
  assert(p >= 0.f);
  // Optional: Delete items with zero probability
  if (p == 0.f)
  {
    Items.RemoveByIndex(i);
    // That's how we delete an element from std::vector by index :P
    ItemProbabilities.erase(ItemProbabilities.begin()+i);
  }
  sum += p;
}
assert(sum > 0.f);
float sumInv = 1.f / sum;

// Second pass: building preprocessed array
ProcessedProbabilities.resize(count);
ProcessedProbabilities[0] = ItemProbabilities[0] * sumInv;
for (size_t i = 1; i < count; i++)
  ProcessedProbabilities[i] =
    ProcessedProbabilities[i-1] + ItemProbabilities[i] * sumInv;

The algorithm above does two things: It prepares ProcessedProbabilities array so that each probability is the item's probability plus the sum of probabilities of all previous items. It also normalizes them to range 0..1. As the result, the output array forms an ascending sequence, where the difference to the previous element is proportional to the item's probability in respect to full 0..1 range. For example:

Input: 100, 100, 200
Temporary data: count=3, sum=400, sumInv=0.0025
Output: 0.25, 0.5, 1

Now as we have these preprocessed data, generating random choices with certain probabilities is fast and simple. We just generate a random real number in range 0..1 and choose the first item for which ProcessedProbabilities[i] is greater than this. Additional optimization would be to use binary search here.

// == Random choice ==

// Input:
std::vector<float> ProcessedProbabilities;
// Output:
size_t choice;

// Algorithm:
choice = 0;
float randNum = RandFloat(); // 0..1
while (choice < ProcessedProbabilities.size()-1
  && ProcessedProbabilities[choice] < randNum)
{
  choice++;
}

Comments (4) | Tags: algorithms math | Author: Adam Sawicki | Share

Pages: > 1 2 3 4 ... 8 >

STAT NO AD [Stat] [Admin] [STAT NO AD] [pub] [Mirror] Copyright © 2004-2017 Adam Sawicki
Copyright © 2004-2017 Adam Sawicki