The Beauty of Sorted Arrays

Warning! Some information on this page is older than 6 years now. I keep it for reference, but it probably doesn't reflect my current knowledge and beliefs.

Mar 2010

A one-dimmensional array (also called vector) is very simple data structure. We were taught that inserting or deleting elements in the middle of the array is slow because we have to move (by rewriting) all following elements. That's why books about C++ teach that for sorted collections - those who provide fast search operation - it's better to use containers like std::set or std::map.

But those who code games in C++ and care about performance know that today's processors (CPU) are very fast in terms of doing calculations and processing lots of data, while access to the RAM memory becomes more and more kind of a bottleneck. According to great presentation Pitfalls of Object Oriented Programming by Tony Albrecht from Sony, a cachce miss (access to some data from the memory which are not currently in CPU cache) costs about 400 cycles! Dynamic memory allocation (like operator new in C++) is also a slow operation. That's why data structures that use separate nodes linked with pointers (like std::list, std::set, std::map) are slower than std::vector and it's often better to keep a sorted collection of elements in a vector, even if you have hundreds or thousands of elements and must sometimes insert or delete them from the middle.

Unfortunately, C++ STL doesn't provide a "std::ordered_vector" container that whould keep items sorted while allocating single chunk of continuous memory. We have to do it by ourselves, like this:

First, we have to choose a way to compare elements. There are multiple solutions (read about comparators in C++), but the simplest one is to just overload operator < in your element structure.

#include <vector>
#include <string>
#include <algorithm>

struct Item
  int Id;
  DWORD Color;
  std::string Name;

  Item() { }
  Item(int id, DWORD color, const std::string name)
    : Id(id), Color(color), Name(name) { }

  bool operator < (const Item &rhs) const { return Id < rhs.Id; }

typedef std::vector<Item> ItemVector;

ItemVector vec;
Item item;

Here is how we insert an item into sorted vector:

item = Item(1, 0xFF0000, "Red");
ItemVector::iterator itToInsert = std::lower_bound(
  vec.begin(), vec.end(), item);
vec.insert(itToInsert, item);

Unfortunately we have to deal with iterators instead of simple indices.

To check if the vector contains element with particular Id:

item = Item(1, 0, std::string());
bool contains = std::binary_search(vec.begin(), vec.end(), item);

Here come another nuisances. We have to create full Item object just to use its Id field for comparison purpose. I'll show tomorrow how to overcome this problem.

It's also annoying that binary_search algorithm returns bool instead of iterator that would show us where the found item is. To find an item and determine its iterator (and from this its index) we have to use lower_bound algorithm. This algorithm is very universal but also hard to understand. It quickly searches a sorted collection and returns the position of the first element that has a value greater than or equivalent to a specified value. It was perfect for determining a place to insert new item into, but to find existing element, we have to use it with this if:

item = Item(1, 0, std::string());
ItemVector::iterator findIt = std::lower_bound(vec.begin(), vec.end(), item);
if (findIt != vec.end() && findIt->Id == item.Id)
  size_t foundIndex = std::distance(vec.begin(), findIt);
  // Not found.

And finally, to remove an item from the vector while having index, we do this trick:

size_t indexToDelete = 0;
vec.erase(vec.begin() + indexToDelete);

Well, C++ STL is very general, universal, powerful and... very complicated, while standard libraries of other programming languages also have some curiosities in their array classes.

For example, ActionScript 3 have single method to insert and remove items:

function splice(startIndex:int, deleteCount:uint, ...  values):Array

To delete some items, you pass non-zero deleteCount. To insert some items, you pass them as values parameter.

In .NET, the System.Collections.Generic.List<T> class (which is actually an array, the name is misnomer) has very smart BinarySearch method. It returns the "zero-based index of item (...), if item is found; otherwise, a negative number that is the bitwise complement of the index of the next element that is larger than item or, if there is no larger element, the bitwise complement of Count.". It can be used to both find existing item in the array or determine an index for a new item, like:

int index = myList->BinarySearch("abc");
if (index < 0)
  myList->Insert(~index, "abc");

See also: My Algorithms for Sorted Arrays

Comments | #algorithms #c++ #stl #flash Share


[Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2024