# July 2009

Uwaga! Informacje na tej stronie mają ponad 5 lat. Nadal je udostępniam, ale prawdopodobnie nie odzwierciedlają one mojej aktualnej wiedzy ani przekonań.

# Sunrise 2009 in Kołobrzeg

Tue
28
Jul 2009

I didn't post anything on my blog in recent days because I've been offline and away from keyboard on my holiday in Kołobrzeg. It's a nice city at the seaside, but I haven't go there to spend time on the beach taking sunbath :) I've been on Sunrise Festival - a big, 3-day party with trance/house music. I've also made about 800 photos with my digital camera and cellphone. You can see some of them in my new gallery: Sunrise 2009.

# Three Ways to Calculate Mean and Variance

Mon
20
Jul 2009

There is a free e-book on DSP (Digital Signal Processing) called The Scientist and Engineer's Guide to Digital Signal Processing (by Steven W. Smith, Ph.D.). As I have a holiday now, I've started reading it and I've found some interesting information even in introductory chapters. For example, there are three algorithms to calculate mean and variance. Let's say we have some data in a vector of bytes and we want to calculate its statistics.

```std::vector<unsigned char> Bytes;
// Fill Bytes...
uint N = Bytes.size();```

```float Mean = 0.f, Variance = 0.f;
for (uint i = 0; i < N; i++)
Mean += (float)Bytes[i];
Mean /= (float)N;
for (uint i = 0; i < N; i++)
Variance += ((float)Bytes[i] - Mean) * ((float)Bytes[i] - Mean);
Variance /= (float)(N-1);```

Now the incremental one, which can return current mean and variance at any time while you can also post next piece of data. All we need to implement it is to keep track of current sample number, sum of samples and sum of squared samples. I've created template class for that, which can be parametrized by float, double, int or other numeric types.

```template <typename T>
class IncrementalMeanAndVarianceCalc
{
public:
IncrementalMeanAndVarianceCalc() : m_Sum(T()), m_SqSum(T()), m_Count(0) { }
void Clear() { m_Sum = m_SqSum = T(); m_Count = 0; }

{
m_Sum += v;
m_SqSum += v*v;
m_Count++;
}
void Add(const T *values, uint valCount)
{
for (uint i = 0; i < valCount; i++)
{
m_Sum += values[i];
m_SqSum += values[i]*values[i];
}
m_Count += valCount;
}

bool IsEmpty() { return m_Count == 0; }
uint GetCount() { return m_Count; }
T GetSum() { return m_Sum; }
T GetSqSum() { return m_SqSum; }

T GetMean()
{
assert(!IsEmpty());
return m_Sum / m_Count;
}
T GetVariance(bool varianceBiased = true)
{
if (varianceBiased)
return (m_SqSum - m_Sum*m_Sum/m_Count) / (m_Count-1);
else
return (m_SqSum - m_Sum*m_Sum/m_Count) / m_Count;
}
void GetMeanAndVariance(T &outMean, T &outVariance, bool varianceBiased = true)
{
assert(!IsEmpty());
outMean = m_Sum / m_Count;
if (varianceBiased)
outVariance = (m_SqSum - m_Sum*m_Sum/m_Count) / (m_Count - 1);
else
outVariance = (m_SqSum - m_Sum*m_Sum/m_Count) / m_Count;
}

private:
T m_Sum, m_SqSum;
uint m_Count;
};```

Finally, there is an algorithm to calculate mean and variance from histogram. It is very efficient method. If we have only 256 possible values for each sample, we don't have to do binning and calculating histogram is very simple:

```uint Histogram[256] = { 0 };
for (uint i = 0; i < N; i++)
Histogram[Bytes[i]]++;```

Now, the mean and variance can be calculated this way:

```float Mean = 0.f, Variance = 0.f;
for (uint i = 0; i < 256; i++)
Mean += (float)i * Histogram[i];
Mean /= (float)N;
for (uint i = 0; i < 256; i++)
Variance += (i - Mean)*(i - Mean) * Histogram[i];
Variance /= N - 1;```

# What is GUID and how to use it

Thu
16
Jul 2009

There are many ways to identify objects in a collection. One of them is to keep direct pointers to objects. Another way is to simply index their positions in an array. We can make user see only some strange indirect "handles" without any obvious meaning, like HWND or FILE*. We can also give objects string names or some numeric identifiers. SQL databases often use integer number to identify record in a table and new identifiers are generated as subsequent numbers. Another solution is to use GUIDs.

GUID means Globally Unique Identifier and it is a kind of general purpose, 128-bit numeric identifier. GUIDs are generated from some random data. We can be sure about their uniqueness in any context because of their size, as 2^128 possible values is more than stars in the whole observable universe :) That's why they are called "globally unique".

We could just generate 128 bits of data from any good pseudorandom number generator and call it good identifiers, but to standardize this idea, they have created standard - RFC4122 (it's actually UUID, but let's ignore the difference). This standard says how to express GUIDs as strings, made of hexadecimal numbers separated by dashes (for example "f81d4fae-7dec-11d0-a765-00a0c91e6bf6"). You can find such GUIDs everywhere. For example, run "regedit.exe" and look into key HKEY_CLASSES_ROOT\CLSID or run your Visual C++ and run command Tools / Create GUID from the main menu.

The standard also defines bit structure of the UUID and algorithms to generate it. There are several possible algorithms. Some of them use system time or even network card MAC address. The simplest one is version 4 algorithm, which uses only random number generator. Some of bits have special meaning and code algorithm which have been used to generate particular GUID.

Here is my code in C++ defining GUID structure and the algorithm to generate it.

# Efficient Finding of Duplicated Files

Tue
14
Jul 2009

I've recently wrote a tool for finding duplicated files in a given directory. I wanted to do it efficiently and here is my algoritm.

My basic idea is to first recursively enumerate all files in the directory and its subdirectories and sort their descriptions by file size. Making it this way, we can then iterate through this list and for each file look ahead to see how many other files have same size. If two files have different sizes, they are obviously not equal in terms of their content. So if there is only one file with particular size, it can just be skipped. If there are two files with same size, we must just compare them by content.

If there are many files with same size, things become more complicated, as any possible combinations of them can turn out to be identical. My solution is to compare two files by a "header" (lets say - first 1024 bytes) and if they are equal - by MD5 checksum of their whole content. Checksum of each file is lazy evaluated, so it's calculated only once, the first time its needed.

As I iterate through the sorted file list, I mark reported files not to process them again. I can do it because I ensure that if a file is reported, all other identical files are also already reported. In my real code I do the same with files I encounter errors with, but here I wanted to simplify the code.

Sun
12
Jul 2009

When I was writing The Final Quest engine for my master thesis, I didn't manage to implement any technique to ensure good quality shadows in outdoor scenes. I've read about all these kinds of perspective reparametrization like PSM (Perspective Shadow Maps), LiSPSM (Light-Space Perspective Shadow Maps), TSM (Trapezoidal Shadow Maps) or XPSM (Extended Perspective Shadow Maps) and all that math behind them seemed very scary to me. Today I know that complexity of these techniques also causes some artifacts in particular cases and commercial games more often use simpler technique called CSM (Cascaded Shadow Maps).

Yesterday I've implemented CSM and I'm very glad with the results. Of course there are always some artifacts, aliasing problems and z-acne on some distant objects under particular surface angles, performance degradation (additional objects rendering to 3 x 1024 x 1024 textures must take some time) etc., but still its the first time I have not so bad outdoor shadows in my code. On the screenshot below you can see transitions between cascades marked with red arrows.

# KD-Tree

Thu
09
Jul 2009

People at forums are usually advised to learn BSP, quadtree or octree as space partitioning data structure, but there are many more interesting structures than these three. This time my choice for home project is KD-tree. It's actually something between BSP and octree. Just like BSP it's a binary tree (each non-leaf node has two child nodes) and optimal splitting plane is estimated each time by special algorithm, but splitting planes are always aligned to one of three main axes and thus each node can be described by an AABB (axis-aligned bounding box), just like in octree.

As my tree is designed to manage objects and not geometry, nothing can be split and some of objects may intersect splitting planes. How to deal with them? I simply assign them to the parent node, so not only leaves are allowed to contain list of objects. To avoid too many small objects intersecting splitting planes to degrade performance by falling into top level nodes (it's called "sticky planes" or something like that :) I adopted "loose octree" idea to my KD-tree. It simply means I extend each node’s bounding box so that each node's children slightly overlap each other and small objects intersecting splitting plane fall into one of the children.

My KD-tree is also dynamic, which means it reorganizes itself as objects get added, removed and moved in the tree. It's actually quite simple. Each time an object is added to a node, that node can be split into two children if it's object number exceeds constant limit. Similarly node can be merged by deleting it's children each time an object is removed from one of its children, if the number of objects in that node and its children drops under constant minimum.

For additional performance, I allocate tree nodes from my own "Free List" memory pool and I keep objects connected to each node as doubly-linked list.

I also came up with an idea how to easily visualize quality of my space partitioning technique. I keep track of the number of tree nodes and objects on each depth level. This way I can tell from these several numbers whether tree is more "tall" or "wide" and whether most of objects stay in leaves instead of some top-level nodes.

Here are some screenshots and a video from my recent code:

Comments | #rendering #video #gallery #algorithms #math

# Computer Self-Defense for Beginners

Sat
04
Jul 2009

Inspired by recent virus-infestation of one of my family member's computer I've decided to write a small guide to computer self-defense for beginners - that is, to write down things that are obvious for every computer scientist, but not for every user. I emphasize I'm not a security specialist so for some of you my thoughts may look lame :)

Historical background: Many years ago people used to copy programs and transfer them between computers using floppy disks. Viruses were infecting EXE files by adding their code to programs and once someone ran such file, the virus spread to other programs in his computer. Viruses were sometimes written for fun. Some of them were harmless, while other did "pranks" such as formatting whole hard drive :) Now we live in the Internet era and programs downloaded from the Web or from CD/DVD disks require installation, so viruses like these are no longer effective.

Today's viruses are no longer infecting particular EXE files. They are rather kind of worms or trojan horses (however you call or classify them) - they just install themselves in Windows and run in the background as separate programs. They are neither destroying whole computer nor totally harmless. Modern malware software is part or organized crime, so it can, for example, steal user's passwords to online bank account, MMORPG game or use his computer to perform DDOS attacks and sending spam.

So how (not) to catch a virus? Viruses are not usually able to infect a computer without user's initiative. It's quite safe to use Windows, be connected to the Internet, browse any websites, read any emails, open any files such as music (like MP3), images (like JPG), archives (like ZIP) or documents (like PDF). The exception is when one of the programs have a bug which can be exploited to execute code embedded into document's data. Such critical security bugs are fixed quickly so the security procedure for them is to update your software regularly - use Windows Update / Microsoft Update and install new versions of programs you use, especially these connecting to the Internet.

The only possibility when a virus can run without user's approval is Autoplay technology for flash drives and CD/DVD disks. Each time you enter such media, Windows looks for autorun.inf file and can automatically run prepared program. As pendrives are very popular nowadays, I consider critical for computer security to turn off the Autoplay functionality. You can do it using free Tweak UI tool, just like that:

It's amazing to me how some people constantly suffer from viruses on their computers while other almost never catch any. Do the second never visit porn sites or use cracks and pirated software? Maybe, but I think it's rather the matter of obeying some simple security rules. Websites, emails or image/music/document files are not able to run arbitrary code on your computer. You must explicitly agree for that. This danger comes in two forms. First one is when a website wants to run/install something on your computer, usually using ActiveX technology. You can see warning about that asking if you really want to allow the website to run such program. Virus installs itself on your computer only if you agree to that.

The second danger is when you just run new EXE file. How to distinguish between safe and dangerous executables? First of all, never run unknown EXE files just for fun. Never trust it's a new brilliant porn screensaver, good and free antivirus software or a document in executable form (like image gallery, video, ebook, archive), even if you have it from your best friend. Pirated software and cracks/keygens from P2P networks also very often contain viruses. On the other hand, you can be almost sure it's safe when you download a well-known application from its author website or a website such as SourceForge.net. If you are not sure about an executable file and you really need to run it, scan it with an antivirus first.

I believe these simple rules are worth much more than not using Windows, Internet Explorer, Outlook Express or cracks/pirated software and using any firewall or resident antivirus protection.