Warning! Some information on this page is older than 5 years now. I keep it for reference, but it probably doesn't reflect my current knowledge and beliefs.

# Random Choice with Given Probabilities

Fri

11

Dec 2009

When we want to randomly choose one option from some possible items, we usually do something as simple as (rand() % numberOfItems). But what if we want to vary probabilities of these items? Here is an algorithm for this:

As input data, we have number of items and an array of their probabilities. Each probability is a nonnegative real number. I like to allow these numbers to be as big as I want, knowing that an item with probability 2 (or 200) is just two times more probable than the item with probability 1 (or 100). I will normalize them later.

// == Preprocessing step == // Input: std::vector<float> ItemProbabilities; // Output: std::vector<float> ProcessedProbabilities; // Algorithm: size_t count = ItemProbabilities.size(); assert(count > 0); // First pass: summation float sum = 0.f; for (size_t i = count; i--; ) // count-1, count-2, ..., 0 { float p = ItemProbabilities[i]; assert(p >= 0.f); // Optional: Delete items with zero probability if (p == 0.f) { Items.RemoveByIndex(i); // That's how we delete an element from std::vector by index :P ItemProbabilities.erase(ItemProbabilities.begin()+i); } sum += p; } assert(sum > 0.f); float sumInv = 1.f / sum; // Second pass: building preprocessed array ProcessedProbabilities.resize(count); ProcessedProbabilities[0] = ItemProbabilities[0] * sumInv; for (size_t i = 1; i < count; i++) ProcessedProbabilities[i] = ProcessedProbabilities[i-1] + ItemProbabilities[i] * sumInv;

The algorithm above does two things: It prepares ProcessedProbabilities array so that each probability is the item's probability plus the sum of probabilities of all previous items. It also normalizes them to range 0..1. As the result, the output array forms an ascending sequence, where the difference to the previous element is proportional to the item's probability in respect to full 0..1 range. For example:

Input: 100, 100, 200 Temporary data: count=3, sum=400, sumInv=0.0025 Output: 0.25, 0.5, 1

Now as we have these preprocessed data, generating random choices with certain probabilities is fast and simple. We just generate a random real number in range 0..1 and choose the first item for which ProcessedProbabilities[i] is greater than this. Additional optimization would be to use binary search here.

// == Random choice == // Input: std::vector<float> ProcessedProbabilities; // Output: size_t choice; // Algorithm: choice = 0; float randNum = RandFloat(); // 0..1 while (choice < ProcessedProbabilities.size()-1 && ProcessedProbabilities[choice] < randNum) { choice++; }

Comments | #algorithms #math Share