Tag: commonlib

Entries for tag "commonlib", ordered from most recent. Entry count: 3.

Warning! Some information on this page is older than 3 years now. I keep it for reference, but it probably doesn't reflect my current knowledge and beliefs.

Pages: 1

19:37
Sat
08
May 2010

LZMA SDK - How to Use

What do you think about when I tell a word "compression"? If you currently study computer science, you probably think about some details of algorithms like RLE, Huffman coding or Burrows-Wheeler transform. If not, then you surely associate compression with archive file formats such as ZIP and RAR. But there is something in between - a kind of libraries that let you compress some data - implement a compression algorithm but do not provide ready file format to pack multiple files and directories into one archive. Such library is useful in gamedev for creating VFS (Virtual File System). Probably the most popular one is zlib - a free C library that implements Deflate algorithm. I've recently discovered another one - LZMA. Its SDK is also free (public domain) and the basic version is a small C library (C++, C# and Java API-s are also available, as well as some additional tools). The library uses LZMA algorithm (Lempel–Ziv–Markov chain algorithm, same as in 7z archive format), which has better compression ratio than Deflate AFAIK. So I've decided to start using it. Here is what I've learned:

If you decide to use only the C API, it's enough to add some C and H files to your project - the ones from LZMASDK\C directory (without subdirectories). Alternatively you can compile them as a static library.

There is a little bit of theory behind the LZMA SDK API. First, the term props means a 5-byte header where the library stores some settings. It must be saved with compressed data to be given to the library before decompression.

Next, the dictionary size. It is the size of a temporary memory block used during compression and decompression. Dictionary size can be set during compression and is then saved inside props. Library uses dictionary of same size during decompression. Default dictionary size is 16 MB so IMHO it's worth changing, especially as I haven't noticed any meaninful drop in compression rate when set it to 64 KB.

And finally, end mark can be saved at the end of compressed data. You can use it so decompression code can determine the end of data. Alternatively you can decide not to use the end mark, but you must remember the exact size of uncompressed data somewhere. I prefer the second method, because remembering data size takes only 4 bytes (for the 4 GB limit) and can be useful anyway, while compressed data finished with end mark are about 6 bytes longer than without it.

Compressing full block of data with single call is simple. You can find appropriate functions in LzmaLib.h header. Here is how you can compress a vector of bytes using LzmaCompress function:

Read full entry > | Comments (9) | Tags: commonlib libraries algorithms | Author: Adam Sawicki | Share

16:47
Sun
21
Mar 2010

My Algorithms for Sorted Arrays

Yesterday I wrote about The Beauty of Sorted Arrays. I believe the C++ STL is not perfect and so I've written some additional code that I needed for the purpose of handling such data structure. It can be found in my CommonLib library.

I once wanted to have a sorting algorithm, just like std::sort, but using insertion sort. The std::sort is of course great, has average O(n log n) complexity and AFAIK internally uses introsort algorithm (an improved version of quicksort). But if we know that the collection is already almost sorted, insertion sort will work faster. So here is my code:

// Basic version that uses operator <
template<class Iterator>
void InsertionSort(Iterator b, Iterator e)
{
  if (b == e) return;
  for (Iterator j = b + 1; j < e; j++)
  {
    typename std::iterator_traits<Iterator>::value_type key;
    key = *j;
    Iterator i = j;
    while (i > b && key < *(i-1))
    {
      *i = *(i-1);
      i--;
    } 
    *i = key;
  }
}

// Overloaded version with explicit comparator
template<class Iterator, typename Compare>
void InsertionSort(Iterator b, Iterator e, Compare Comp)
{
  if (b == e) return;
  for (Iterator j = b + 1; j < e; j++)
  {
    typename std::iterator_traits<Iterator>::value_type key;
    key = *j;
    Iterator i = j;
    while (i > b && Comp(key, *(i-1)))
    {
      *i = *(i-1);
      i--;
    } 
    *i = key;
  }
}

Another problem is finding an item in the ordered array by some key. I can see two solutions in C++, both unsatisfactory. First one is to write your own comparator or overload operator < for your item structure to compare items by key, like this:

struct Item
{
  int Id;
  DWORD Color;
  std::string Name;

  bool operator < (const Item &rhs) const { return Id < rhs.Id; }
};

The problem is that C++ comparator always compares two full items, so to use this comparing feature for finding item in the array, we must construct and pass full Item object, despite we use only the Id field.

Item item;
item.Id = 1;
bool contains = std::binary_search(vec.begin(), vec.end(), item);

Second solution is to use std::map container. It has separate key and value type so you can search by key:

std::map<int, Item>::iterator foundIt = map.find(1);
if (foundIt != map.end())
  std::cout << "Found ID=" << foundIt->first
    << ", Name=" << foundIt->second.Name << std::endl;

But the problem is that the key field is separate from value type so the Id field should not be included in the Item structure because it's redundant. We should instead use std::pair<int, Item> as the type of items contained in the map, which is alised to std::map<int, Item>::value_type.

I wanted to have more convenient way to quickly search an ordered array by key, so I've coded these two function templates:

template <typename IterT, typename KeyT, typename CmpT>
IterT FirstNotLessIndex(IterT beg, IterT end, const KeyT &key, CmpT cmp)
{
  unsigned down = 0, up = (end - beg);
  while(down < up)
  {
    unsigned mid = (down + up) / 2;
    int res = cmp(key, *(beg + mid));
    if(res > 0)
      down = mid + 1;
    else
      up = mid;
  }
  return beg + down;
}

template <typename IterT, typename KeyT, typename CmpT>
IterT BinarySearch(IterT beg, IterT end, const KeyT &key, CmpT cmp)
{
  IterT it = FirstNotLessIndex<IterT, KeyT, CmpT>(beg, end, key, cmp);
  if (it == end || cmp(key, *it) != 0)
    return end;
  return it;
}

First one performs binary search to determine the iterator for a first element not less (greater or equal) than the specified key. It's perfect for determining the place for inserting new item into the array. Second function uses first one to check whether an item with exact matching key exists in the array and returns iterator pointing to it, or end iterator if not found.

As you can see, both functions take the type of the key as a template parameter, which may be different than full item type pointed by iterator. They also take a comparator functor which is always called with parameters (key, item) and should return int. The comparator should return a number less, equal or greater than zero according to the result of the comparison - just like strcmp, memcmp and other functions from old C.

Now my example can be written like this:

struct Item
{
  int Id;
  DWORD Color;
  std::string Name;

  static int Cmp(int id, const Item &rhs) { return id - rhs.Id; }
};

bool contains =
  common::BinarySearch(vec.begin(), vec.end(), 1, &Item::Cmp) != vec.end();

Having such universal comparison function, it's easy to convert it to all comparison operators:

bool operator <  (const Item &rhs) const { return Cmp(Id, rhs) <  0; }
bool operator >  (const Item &rhs) const { return Cmp(Id, rhs) >  0; }
bool operator <= (const Item &rhs) const { return Cmp(Id, rhs) <= 0; }
bool operator >= (const Item &rhs) const { return Cmp(Id, rhs) <= 0; }
bool operator == (const Item &rhs) const { return Cmp(Id, rhs) == 0; }
bool operator != (const Item &rhs) const { return Cmp(Id, rhs) != 0; }

Opposite conversion is also possible. If you have a type with operator < accessible (like float or your own type with overloaded operator), you can convert it to a comparator returning int with this function:

template <typename T>
inline int UniversalCmp(const T &a, const T &b)
{
  if (a < b) return -1;
  if (b < a) return  1;
  return 0;
}

Comments (0) | Tags: commonlib stl c++ algorithms | Author: Adam Sawicki | Share

22:00
Tue
12
Jan 2010

Processing File Paths in C++

Paths to files and directories are special kind of strings. Sometimes they have to be processed, e.g. merged or splitted into some parts like drive letter, directory, file name and extension. Standard C library provides two function for this: makepath and splitpath. They use char* strings and operate always on all possible parts (drive, dir, fname, ext). In my opinion they are not very convenient. On the other hand, modern programming languages have more functionality in this field. For example, .NET has System.IO.Path static class with methods like GetExtension, GetFileName, GetDirectoryName.

To make operations on file paths convenient in C++, I've developed some time ago my own functions for that which operate on std::string strings. I've included them in my CommonLib library. They are similar a bit to these from Delphi :) Here is how I approach this topic: First, a common task is to extract part of a path (e.g. "C:\Windows\notepad.exe"). My functions for this are:

I also have some functions for processing paths:

Path can be absolute or relative. In Windows, absolute paths start with drive letter like "C:\" (which we can recognized by looking for colon) or network share (paths that begin with "\\"). Relative paths (where most basic example is a single file name) make sense only in some context. When we use relative paths to open real files, we address them relative to the "Current Directory" (which is a separate topic, I will write about it another time). To deal with absolute and relative paths, I've coded functions such as:

NormalizePath function preprocessess all the . and .. drectories in the path. A single dot used as directory name means "my own directory" (just does nothing) and two dots mean going up one level to the parent directory. So the path we consider here is logically equal to "C:\Windows\.\system32\..\notepad.exe". Paths like this still work, but they can look a bit unclear, so you can use NormalizePath function to clean them.

Paths in Linux are quite different. My functions also support them, although I didn't test this case in real situations yet. First of all, there are no drive letters in Linux, so absolute paths just start with '/', like "/etc/passwd". Second, a slash sign is used as a separator instead of backslash (it's interesting that slash also works in Windows). It's also more common in Linux than in Windows to meet files that don't have extension or have double extension (like "MyFile.tar.gz"). A question arises here whether what we call "extension" starts from the first or from the second dot :)

Comments (1) | Tags: algorithms c++ commonlib | Author: Adam Sawicki | Share

Pages: 1

STAT NO AD [Stat] [Admin] [STAT NO AD] [pub] [Mirror] Copyright © 2004-2017 Adam Sawicki
Copyright © 2004-2017 Adam Sawicki