Entries for tag "algorithms", ordered from most recent. Entry count: 63.

Uwaga! Informacje na tej stronie mają ponad 5 lat. Nadal je udostępniam, ale prawdopodobnie nie odzwierciedlają one mojej aktualnej wiedzy ani przekonań.

# Random Choice with Given Probabilities

20:54

Fri

11

Dec 2009

When we want to randomly choose one option from some possible items, we usually do something as simple as (rand() % numberOfItems). But what if we want to vary probabilities of these items? Here is an algorithm for this:

As input data, we have number of items and an array of their probabilities. Each probability is a nonnegative real number. I like to allow these numbers to be as big as I want, knowing that an item with probability 2 (or 200) is just two times more probable than the item with probability 1 (or 100). I will normalize them later.

// == Preprocessing step == // Input: std::vector<float> ItemProbabilities; // Output: std::vector<float> ProcessedProbabilities; // Algorithm: size_t count = ItemProbabilities.size(); assert(count > 0); // First pass: summation float sum = 0.f; for (size_t i = count; i--; ) // count-1, count-2, ..., 0 { float p = ItemProbabilities[i]; assert(p >= 0.f); // Optional: Delete items with zero probability if (p == 0.f) { Items.RemoveByIndex(i); // That's how we delete an element from std::vector by index :P ItemProbabilities.erase(ItemProbabilities.begin()+i); } sum += p; } assert(sum > 0.f); float sumInv = 1.f / sum; // Second pass: building preprocessed array ProcessedProbabilities.resize(count); ProcessedProbabilities[0] = ItemProbabilities[0] * sumInv; for (size_t i = 1; i < count; i++) ProcessedProbabilities[i] = ProcessedProbabilities[i-1] + ItemProbabilities[i] * sumInv;

The algorithm above does two things: It prepares ProcessedProbabilities array so that each probability is the item's probability plus the sum of probabilities of all previous items. It also normalizes them to range 0..1. As the result, the output array forms an ascending sequence, where the difference to the previous element is proportional to the item's probability in respect to full 0..1 range. For example:

Input: 100, 100, 200 Temporary data: count=3, sum=400, sumInv=0.0025 Output: 0.25, 0.5, 1

Now as we have these preprocessed data, generating random choices with certain probabilities is fast and simple. We just generate a random real number in range 0..1 and choose the first item for which ProcessedProbabilities[i] is greater than this. Additional optimization would be to use binary search here.

// == Random choice == // Input: std::vector<float> ProcessedProbabilities; // Output: size_t choice; // Algorithm: choice = 0; float randNum = RandFloat(); // 0..1 while (choice < ProcessedProbabilities.size()-1 && ProcessedProbabilities[choice] < randNum) { choice++; }

Comments | #algorithms #math Share

# Calculating Linear and Quadratic Equation Coefficients

21:35

Sun

30

Aug 2009

Some time ago I've written about formulas to calculate coefficients of linear and quadratic equation [pl] when having 2 or 3 given points (x,y). Yesterday I suddenly noticed that my code needs to do it every frame so I need functions to calculate these coefficients. Below you can find the code of my functions. They are templates, so they work with many types including float, double, as well as vectors of any dimmension and other types which have addition, subtraction and assignment operators, as well as multiplication and divistion by float scalar.

Read full entry | Comments | #rendering #math #algorithms Share

# Source Code and Ray Tracer

21:11

Wed

19

Aug 2009

As some of you are interested in seeing source code of my 2D Software Renderer, I'll fullfill this request, although I don't think there is anything special about this code. Here is the file:
**SoftRender01.zip**
and here is a brief description of this project:

- It's definitely not finished.
- It uses WinAPI and my CommonLib library.
- AsyncConsole is a code snipped I've published before in my paper on Asynchronous Windows Console [pl].
- In the FontLoad files you can find simple example of using FreeType library.
- In the ImgLoad files you can find simple example of using DevIL library.
- In StdAfx and TokenWriter files you can find some of my code that is going to be included in upcoming version of my CommonLib.
- Files VariableManager, VariableTypes and MathVariableTypes contain quite smart "property" system (or maybe even reflection?) with text and binary serialization that I'm very satisfied with.
- And finally, Renderer itself just contains some of elementary CG algorithms implemented, so you better study Wikipedia and other sources for algorithms such as Line clipping (I've used Liang-Barsky method), Line drawing algorithm (I've used Bresenham method) etc.

BTW, I've started coding a raytracer. Of course the first primitive I support is a sphere and first effects I've coded are shadows and reflections. Ray tracing is not magic. It's just like "foreach pixel, foreach object, render", while rasterization is the opposite: "foreach object, foreach pixel, render". But It's quite fascinating that raytracer suppors directly some of effects that require special preprocessing in standard rasterization (like preparing shadow map and enviromental map in case of my effects).

Comments | #productions #rendering #algorithms Share

# Three Ways to Calculate Mean and Variance

23:05

Mon

20

Jul 2009

There is a free e-book on DSP (Digital Signal Processing) called The Scientist and Engineer's Guide to Digital Signal Processing (by Steven W. Smith, Ph.D.). As I have a holiday now, I've started reading it and I've found some interesting information even in introductory chapters. For example, there are three algorithms to calculate mean and variance. Let's say we have some data in a vector of bytes and we want to calculate its statistics.

std::vector<unsigned char> Bytes; // Fill Bytes... uint N = Bytes.size();

Traditional algorithm looks like this:

float Mean = 0.f, Variance = 0.f; for (uint i = 0; i < N; i++) Mean += (float)Bytes[i]; Mean /= (float)N; for (uint i = 0; i < N; i++) Variance += ((float)Bytes[i] - Mean) * ((float)Bytes[i] - Mean); Variance /= (float)(N-1);

Now the incremental one, which can return current mean and variance at any time while you can also post next piece of data. All we need to implement it is to keep track of current sample number, sum of samples and sum of squared samples. I've created template class for that, which can be parametrized by float, double, int or other numeric types.

template <typename T> class IncrementalMeanAndVarianceCalc { public: IncrementalMeanAndVarianceCalc() : m_Sum(T()), m_SqSum(T()), m_Count(0) { } void Clear() { m_Sum = m_SqSum = T(); m_Count = 0; } void Add(const T &v) { m_Sum += v; m_SqSum += v*v; m_Count++; } void Add(const T *values, uint valCount) { for (uint i = 0; i < valCount; i++) { m_Sum += values[i]; m_SqSum += values[i]*values[i]; } m_Count += valCount; } bool IsEmpty() { return m_Count == 0; } uint GetCount() { return m_Count; } T GetSum() { return m_Sum; } T GetSqSum() { return m_SqSum; } T GetMean() { assert(!IsEmpty()); return m_Sum / m_Count; } T GetVariance(bool varianceBiased = true) { if (varianceBiased) return (m_SqSum - m_Sum*m_Sum/m_Count) / (m_Count-1); else return (m_SqSum - m_Sum*m_Sum/m_Count) / m_Count; } void GetMeanAndVariance(T &outMean, T &outVariance, bool varianceBiased = true) { assert(!IsEmpty()); outMean = m_Sum / m_Count; if (varianceBiased) outVariance = (m_SqSum - m_Sum*m_Sum/m_Count) / (m_Count - 1); else outVariance = (m_SqSum - m_Sum*m_Sum/m_Count) / m_Count; } private: T m_Sum, m_SqSum; uint m_Count; };

Finally, there is an algorithm to calculate mean and variance from histogram. It is very efficient method. If we have only 256 possible values for each sample, we don't have to do binning and calculating histogram is very simple:

uint Histogram[256] = { 0 }; for (uint i = 0; i < N; i++) Histogram[Bytes[i]]++;

Now, the mean and variance can be calculated this way:

float Mean = 0.f, Variance = 0.f; for (uint i = 0; i < 256; i++) Mean += (float)i * Histogram[i]; Mean /= (float)N; for (uint i = 0; i < 256; i++) Variance += (i - Mean)*(i - Mean) * Histogram[i]; Variance /= N - 1;

Comments | #algorithms #math Share

# What is GUID and how to use it

20:57

Thu

16

Jul 2009

There are many ways to identify objects in a collection. One of them is to keep direct pointers to objects. Another way is to simply index their positions in an array. We can make user see only some strange indirect "handles" without any obvious meaning, like HWND or FILE*. We can also give objects string names or some numeric identifiers. SQL databases often use integer number to identify record in a table and new identifiers are generated as subsequent numbers. Another solution is to use GUIDs.

GUID means Globally Unique Identifier and it is a kind of general purpose, 128-bit numeric identifier. GUIDs are generated from some random data. We can be sure about their uniqueness in any context because of their size, as 2^128 possible values is more than stars in the whole observable universe :) That's why they are called "globally unique".

We could just generate 128 bits of data from any good pseudorandom number generator and call it good identifiers, but to standardize this idea, they have created standard - RFC4122 (it's actually UUID, but let's ignore the difference). This standard says how to express GUIDs as strings, made of hexadecimal numbers separated by dashes (for example "f81d4fae-7dec-11d0-a765-00a0c91e6bf6"). You can find such GUIDs everywhere. For example, run "regedit.exe" and look into key HKEY_CLASSES_ROOT\CLSID or run your Visual C++ and run command Tools / Create GUID from the main menu.

The standard also defines bit structure of the UUID and algorithms to generate it. There are several possible algorithms. Some of them use system time or even network card MAC address. The simplest one is version 4 algorithm, which uses only random number generator. Some of bits have special meaning and code algorithm which have been used to generate particular GUID.

Here is my code in C++ defining GUID structure and the algorithm to generate it.

Read full entry | Comments | #c++ #algorithms Share

# Efficient Finding of Duplicated Files

21:06

Tue

14

Jul 2009

I've recently wrote a tool for finding duplicated files in a given directory. I wanted to do it efficiently and here is my algoritm.

My basic idea is to first recursively enumerate all files in the directory and its subdirectories and sort their descriptions by file size. Making it this way, we can then iterate through this list and for each file look ahead to see how many other files have same size. If two files have different sizes, they are obviously not equal in terms of their content. So if there is only one file with particular size, it can just be skipped. If there are two files with same size, we must just compare them by content.

If there are many files with same size, things become more complicated, as any possible combinations of them can turn out to be identical. My solution is to compare two files by a "header" (lets say - first 1024 bytes) and if they are equal - by MD5 checksum of their whole content. Checksum of each file is lazy evaluated, so it's calculated only once, the first time its needed.

As I iterate through the sorted file list, I mark reported files not to process them again. I can do it because I ensure that if a file is reported, all other identical files are also already reported. In my real code I do the same with files I encounter errors with, but here I wanted to simplify the code.

Read full entry | Comments | #algorithms #winapi Share

# KD-Tree

23:14

Thu

09

Jul 2009

People at forums are usually advised to learn BSP, quadtree or octree as space partitioning data structure, but there are many more interesting structures than these three. This time my choice for home project is KD-tree. It's actually something between BSP and octree. Just like BSP it's a binary tree (each non-leaf node has two child nodes) and optimal splitting plane is estimated each time by special algorithm, but splitting planes are always aligned to one of three main axes and thus each node can be described by an AABB (axis-aligned bounding box), just like in octree.

As my tree is designed to manage objects and not geometry, nothing can be split and some of objects may intersect splitting planes. How to deal with them? I simply assign them to the parent node, so not only leaves are allowed to contain list of objects. To avoid too many small objects intersecting splitting planes to degrade performance by falling into top level nodes (it's called "sticky planes" or something like that :) I adopted "loose octree" idea to my KD-tree. It simply means I extend each node’s bounding box so that each node's children slightly overlap each other and small objects intersecting splitting plane fall into one of the children.

My KD-tree is also dynamic, which means it reorganizes itself as objects get added, removed and moved in the tree. It's actually quite simple. Each time an object is added to a node, that node can be split into two children if it's object number exceeds constant limit. Similarly node can be merged by deleting it's children each time an object is removed from one of its children, if the number of objects in that node and its children drops under constant minimum.

For additional performance, I allocate tree nodes from my own "Free List" memory pool and I keep objects connected to each node as doubly-linked list.

I also came up with an idea how to easily visualize quality of my space partitioning technique. I keep track of the number of tree nodes and objects on each depth level. This way I can tell from these several numbers whether tree is more "tall" or "wide" and whether most of objects stay in leaves instead of some top-level nodes.

Here are some screenshots and a video from my recent code:

Comments | #rendering #video #gallery #algorithms #math Share

# Manager zasobów #2 - Wczytywanie w tle

12:23

Sat

23

May 2009

Sam manager zasobów napisałem wg podobnych założeń, jakie miałem w TFQ7. Jest jeden globalny manager zasobów g_ResMngr przechowujący kolekcję wszystkich zasobów. Zasób jest obiektem klasy pochodnej od Resource implementującej odpowiednie metody wirtualne. Zasób może mieć nazwę i można wyszukiwać zasoby wg nazwy, ale może też mieć nazwę pustą. Zasoby można swobodnie tworzyć i usuwać. Obiekt klasy Resource istnieje przez cały czas życia zasobu, a wewnątrz pamięta stan - niezaładowany, w tracie ładowania, załadowany itd.

class Resource { public: enum STATE { STATE_UNLOADED, STATE_LOADING, STATE_LOADED, STATE_LOADED_ERROR, }; Resource(const string &Name); virtual ~Resource(); STATE GetState() { return m_State; } bool IsLoaded() { return m_State == STATE_LOADED; } void Load(); // Żąda załadowania już teraz void BeginLoading(); // Rozpoczyna ładowanie w tle void Unload(); // Odładowuje //... protected: virtual void OnLoad() = 0; virtual void GetLoadType(bool *OutUseBkg, BkgJob::TYPE *OutBkgJobType) = 0; virtual void OnLoadBkg() { } // Wykonywana na osobnym wątku virtual void OnLoadAfterBkg() { } virtual void OnUnload() = 0; private: string m_Name; STATE m_State; //... }; class ResourceManager { public: //... Resource * Find(const tstring &Name); Resource * MustFind(const tstring &Name); template <typename T> T * FindEx(const tstring &Name) { /*...*/ } template <typename T> T * MustFindEx(const tstring &Name) { /*...*/ } }; extern ResourceManager * g_ResMngr;