# Binary Search Revisited

posted by Stephan Brumme

## Same same but different ?!

Some time ago I had to write code for a binary search of sorted data. Obviously there are libraries which can do the heavy lifting for me, such as C++'s STL.

When I looked up the source code of STL's `binary_search` I was surprised to find that it slightly differs from what I expected: it always performs log2(N) less-than comparisons and only after that checks for equality exactly once:
hide binary_search / pseudo code (STL) ``` while at least one element { find element in the middle if it's smaller than x strip all elements before and including middle element else strip all elements after middle element } return true if remaining element matches x, else false ```
That means if it finds a match (something that equals what we are searching for) it still keeps going until subdivision finishes - and forfeits the chance for an early exit.
The STL guys usually prefer fast algorithms - why did they do it ?

This clearly calls for some serious googling ... and the vast majority of web sites, including Wikipedia, combine the less-than comparison with an check for equality and indeed allow for an early exit if they find a match:
hide binary_search / pseudo code (most common algorithm) ``` while at least one element { find element in the middle if it matches x, return true <== NEW if it's smaller than x strip all elements before and including middle element else strip all elements after middle element } return false <== DIFFERENT ```
In Big-O notation: if we have a container with N elements, the following number of comparisons are expected on average:
check if less than check if equal
STL ≈ log2(N) 1
"rest of world" ≈ log2(N) - 1 ≈ log2(N) - 1
Is STL right ? Does saving all these checks for equality at the cost of one more less-than give a real-world performance boost ?
There is only one way to find the best `binary_search`: WE NEED BENCHMARKS !
Git users: scroll down to the repository link Latest release: April 24, 2013, size: 10.2 kBytes, 381 lines

CRC32: `8ccb4820`
MD5: `3e13d18e5c402b03dd670e18d11c8510`
SHA1: `1dce5af123bf2f76c97bd1fd21c9c35d444dd32c`
SHA256:`47efec0b9b5facca2a8ccf29d0b7a36972140ee61478f16de48054c57455e36b`

Stay up-to-date:git clone http://create.stephan-brumme.com/binary-search/.git

If you encounter any bugs/problems or have ideas for improving future versions, please write me an email: create@stephan-brumme.com

## Simplifying the Code

Visual C++'s and GCC's STL implementations of `binary_search` rely on `lower_bound` (I'm pretty sure all STL libraries do). However, like most STL stuff the code is hardly readable. After inlining `lower_bound`, stripping off all the debugging stuff and replacing the weird variables and type names by something more understandable, `binary_search` looks like this: I highlighted lines 16-19 to show where the check for equality could be inserted.

Note: you need a C++11 compiler because of the super-convenient `auto` keyword.

## Benchmark

The main benchmark parameters are the container type (`std::vector`, `std::deque` or `std::list`) and: My sorted data is generated by a simple loop: `step` (and `notFound=step/2`) is responsible for creating a certain gap between neighboring numbers. These gaps are helpful to get a better estimate of the algorithm's runtime because they allow for looking for numbers that are NOT in the container in an evenly distributed manner.
index 0 1 2 3 4 5 ...
sorted[index] 0 10 20 30 40 50 ...
The benchmark probes `0` (succeeds), `5` (fails), `10` (succeeds), `15` (fails), `20` (succeeds), `25` (fails), `30` (succeeds), `35` (fails), etc.:
hide test routine ``` for (size_t i = 0; i < numElements; i++) { // hit Number succeed = Number(i*step); if (!std::binary_search(sorted.begin(), sorted.end(), succeed)) std::cout << "error @" << succeed << std::endl; // miss Number fail = Number(i*step+notFound); if ( std::binary_search(sorted.begin(), sorted.end(), fail)) std::cout << "error @" << fail << std::endl; } ```
Searching 1000000 sorted numbers requires to repeat the inner `while`-loop about 20 times. All compiler optimizations I could think of were enabled.
That's what I get:
1 million `int` Visual C++ 2010 GCC 4.7
std::binary_search 131ms 122ms
my binary search 107ms 96ms
early check for equality 160ms 97ms
recursive 242ms 110ms
hybrid (8) 85ms 133ms
The hybrid algorithm (discussed below) is the overall winner for `int`egers (compiled with Visual C++).

My implementation of `std::binary_search` isn't as nested as the STL and lacks some iterator checking. In the end, both compilers generate code which is about 25% faster code than the STL.
For GCC it doesn't really matter which variant of my binary search implementations is chosen but Visual C++ seriously loses ground when enabling the early check for equality.
I added `#define _SECURE_SCL 0` to disable Visual C++'s checked iterators in release mode, too.

Let's have a look at `float` (very similar results for `double`):
1 million `float` Visual C++ 2010 GCC 4.7
std::binary_search 251ms 119ms
my binary search 217ms 96ms
early check for equality 200ms 126ms
recursive 290ms 129ms
hybrid (16) 158ms 132ms
GCC remains mostly unaffected but Visual C++ becomes about two times slower ! It's worth noting that Visual C++ in this scenario becomes faster when adding the early check for equality. The difference is small but consistent.

## Even More Simplifying the Code

A recursive implementation of the binary search is quite short:
hide binary_search_recursive ``` /// recursive implementation template <typename Iterator, typename Value> bool binary_search_recursive(Iterator begin, Iterator end, Value value) { // empty ? if (begin == end) return false; // stop recursion ? auto numElements = std::distance(begin, end); if (numElements == 1) return value == *begin; // pivot element auto middlePos = numElements >> 1; auto middle = begin; std::advance(middle, middlePos); // keep looking on the right or left side of pivot element ? if (value < *middle) return binary_search_recursive(begin, middle, value); else return binary_search_recursive(middle, end, value); } ```
As we have seen in the benchmark, it runs pretty fast on GCC 4.7 - and sucks with Visual C++ 2010.

## Hybrid Approach

When the container holds small element types which are laid out next to each other in the memory (such as `int`), the L1/L2 caches' influence plays a major role.
My hybrid algorithm switches from binary subdivision to linear search after a given `thresholdLinear` was reached:
hide binary_search_hybrid ``` /// linear search in inner loop template <size_t thresholdLinear, typename Iterator, typename Value> bool binary_search_hybrid(Iterator begin, Iterator end, Value value) { // narrow down range of candidates auto numElements = std::distance(begin, end); // repeat until only a few elements are left while (numElements > int(thresholdLinear)) { // pivot element auto middlePos = numElements >> 1; auto middle = begin; std::advance(middle, middlePos); if (*middle < value) { // value must be on the right side of pivot element begin = middle; ++begin; numElements = numElements - middlePos - 1; } else { // value must be on the left side of pivot element (or it is the pivot element) numElements = middlePos; } } // skip all smaller values while (begin != end && *begin < value) ++begin; // hit ? return begin != end && *begin == value; } ```
In contrast to the original algorithm, here the `while` compares against `thresholdLinear`, not `0`. Lines 30-35 are new, everything else should be familiar.

The best threshold value heavily depends on the data type. For 32-bit `int`s I found `thresholdLinear` = `8` to be the best parameter.

## The World Is Not A Vector

`std::binary_search` accepts arbitrary forward iterators and makes it easy to replace the container (benchmarks above: `std::vector`).
Here are the results of STL's double-ended queue (`std::deque`):
`std::deque`, 1 million `int` Visual C++ 2010 GCC 4.7
std::binary_search 352ms 206ms
my binary search 335ms 204ms
early check for equality 482ms 215ms
recursive 510ms 540ms
hybrid (8) 309ms 198ms
GCC wins all benchmarks by a huge margin. The hybrid algorithm fits well to the paged structure of a `std::deque` and utilizes the CPU caches best.

When looking at `std::list` we have to keep in mind that even though only a forward iterator is required, a lot of iterating is done in `std::advance` and `std::distance`.
Therefore instead of 1000000 only 10000 `int`s are pushed into an `std::list`:
`std::list`, ten thousand `int` Visual C++ 2010 GCC 4.7
std::binary_search 692ms 903ms
my binary search 683ms 878ms
early check for equality 687ms 882ms
recursive 982ms 1261ms
hybrid (8) 686ms 884ms
This time, Visual C++ wins all benchmarks over GCC. Maybe VC++ comes with a better memory allocator.
Unlike all tables before, I didn't highlight the fastest algorithm - because it's not shown in this table !
A simple `std::find` is faster than all presented algorithm when working on `std::list`. It can be improved for sorted data, too: abort when the iterator points at values larger than the value we are looking for. And here's the code of `find` and `find_sorted`:
hide find ``` /// same algorithm as STL's find template <typename Iterator, typename Value> Iterator find(Iterator begin, Iterator end, Value value) { // compare all values until found while (begin != end) { if (*begin == value) return begin; // found ! ++begin; } // not found return end; } ```
hide find_sorted ``` /// find, optimized for sorted data // requires: iff !(x<y) && !(x>y) then x == y template <typename Iterator, typename Value> Iterator find_sorted(Iterator begin, Iterator end, Value value) { // skip all smaller values while (begin != end && *begin < value) ++begin; // hit ? if (begin != end && *begin == value) return begin; else return end; } ```
Note: these two templates return an `Iterator` instead of `bool` (just like the original `std::find`).
`std::list`, 10'000 `int` Visual C++ 2010 GCC 4.7
find 253ms 292ms
find_sorted 163ms 184ms
That's 3 to 4 times faster than before. However, a small `std::vector` with only 10000 `int`s is still the clear winner:
`std::vector`, ten thousand `int` Visual C++ 2010 GCC 4.7
hybrid binary search 1.2ms 0.8ms
find 192ms 160ms
find_sorted 125ms 62ms
The same test with 1 million `int` in a `std::list` takes several minutes for `find` and `find_sorted` to complete.

## The World Is Not A Simple Number

My benchmarks were completely memory-based and the objects only simple numbers (`int` or `float`). Performance was mainly influenced by instruction branching and caching.

If data is stored on high-latency storage, especially storage with slow seeking times like a hard disk or some network-based solution, then the advantage of an early equality test truly kicks in. The hybrid approach computes the most comparisons and suffers severely if comparisons are expensive (e.g. large objects instead of primitive data types such as `int`).

If your container is very small then `find` can be surprisingly efficient because it gets support from modern CPU's intelligent prefetching. For less than approx. 50 `int`s, `find_sorted` is as fast as a binary search and for less than approx. 25 `int`s, even `find` becomes competitive. The simple `std::find` has a few advantages, too: no need for the less-than operator and - more importantly - the input data doesn't have to be sorted.