Binary Search Revisited

posted by Stephan Brumme

Same same but different ?!

Some time ago I had to write code for a binary search of sorted data. Obviously there are libraries which can do the heavy lifting for me, such as C++'s STL.

When I looked up the source code of STL's binary_search I was surprised to find that it slightly differs from what I expected: it always performs log2(N) less-than comparisons and only after that checks for equality exactly once:
hide binary_search / pseudo code (STL) while at least one element { find element in the middle if it's smaller than x strip all elements before and including middle element else strip all elements after middle element } return true if remaining element matches x, else false
That means if it finds a match (something that equals what we are searching for) it still keeps going until subdivision finishes - and forfeits the chance for an early exit.
The STL guys usually prefer fast algorithms - why did they do it ?

This clearly calls for some serious googling ... and the vast majority of web sites, including Wikipedia, combine the less-than comparison with an check for equality and indeed allow for an early exit if they find a match:
hide binary_search / pseudo code (most common algorithm) while at least one element { find element in the middle if it matches x, return true <== NEW if it's smaller than x strip all elements before and including middle element else strip all elements after middle element } return false <== DIFFERENT
In Big-O notation: if we have a container with N elements, the following number of comparisons are expected on average:
check if less than check if equal
STL ≈ log2(N) 1
"rest of world" ≈ log2(N) - 1 ≈ log2(N) - 1
Is STL right ? Does saving all these checks for equality at the cost of one more less-than give a real-world performance boost ?
There is only one way to find the best binary_search: WE NEED BENCHMARKS !
Git users: scroll down for the repository link
Download  BinarySearch.cpp
Latest release: April 24, 2013, size: 10.2 kBytes, 381 lines

CRC32:8ccb4820
MD5:3e13d18e5c402b03dd670e18d11c8510
SHA1:1dce5af123bf2f76c97bd1fd21c9c35d444dd32c
SHA256:47efec0b9b5facca2a8ccf29d0b7a36972140ee61478f16de48054c57455e36b

Stay up-to-date:git clone http://create.stephan-brumme.com/binary-search/.git

If you encounter any bugs/problems or have ideas for improving future versions, please write me an email: create@stephan-brumme.com

Simplifying the Code

Visual C++'s and GCC's STL implementations of binary_search rely on lower_bound (I'm pretty sure all STL libraries do). However, like most STL stuff the code is hardly readable. After inlining lower_bound, stripping off all the debugging stuff and replacing the weird variables and type names by something more understandable, binary_search looks like this: I highlighted lines 16-19 to show where the check for equality could be inserted.

Note: you need a C++11 compiler because of the super-convenient auto keyword.

Benchmark

The main benchmark parameters are the container type (std::vector, std::deque or std::list) and: My sorted data is generated by a simple loop: step (and notFound=step/2) is responsible for creating a certain gap between neighboring numbers. These gaps are helpful to get a better estimate of the algorithm's runtime because they allow for looking for numbers that are NOT in the container in an evenly distributed manner.
index 0 1 2 3 4 5 ...
sorted[index] 0 10 20 30 40 50 ...
The benchmark probes 0 (succeeds), 5 (fails), 10 (succeeds), 15 (fails), 20 (succeeds), 25 (fails), 30 (succeeds), 35 (fails), etc.:
hide test routine for (size_t i = 0; i < numElements; i++) { // hit Number succeed = Number(i*step); if (!std::binary_search(sorted.begin(), sorted.end(), succeed)) std::cout << "error @" << succeed << std::endl; // miss Number fail = Number(i*step+notFound); if ( std::binary_search(sorted.begin(), sorted.end(), fail)) std::cout << "error @" << fail << std::endl; }
Searching 1000000 sorted numbers requires to repeat the inner while-loop about 20 times. All compiler optimizations I could think of were enabled.
That's what I get:
1 million int Visual C++ 2010 GCC 4.7
std::binary_search 131ms 122ms
my binary search 107ms 96ms
early check for equality 160ms 97ms
recursive 242ms 110ms
hybrid (8) 85ms 133ms
The hybrid algorithm (discussed below) is the overall winner for integers (compiled with Visual C++).

My implementation of std::binary_search isn't as nested as the STL and lacks some iterator checking. In the end, both compilers generate code which is about 25% faster code than the STL.
For GCC it doesn't really matter which variant of my binary search implementations is chosen but Visual C++ seriously loses ground when enabling the early check for equality.
I added #define _SECURE_SCL 0 to disable Visual C++'s checked iterators in release mode, too.

Let's have a look at float (very similar results for double):
1 million float Visual C++ 2010 GCC 4.7
std::binary_search 251ms 119ms
my binary search 217ms 96ms
early check for equality 200ms 126ms
recursive 290ms 129ms
hybrid (16) 158ms 132ms
GCC remains mostly unaffected but Visual C++ becomes about two times slower ! It's worth noting that Visual C++ in this scenario becomes faster when adding the early check for equality. The difference is small but consistent.

Even More Simplifying the Code

A recursive implementation of the binary search is quite short:
hide binary_search_recursive /// recursive implementation template <typename Iterator, typename Value> bool binary_search_recursive(Iterator begin, Iterator end, Value value) { // empty ? if (begin == end) return false; // stop recursion ? auto numElements = std::distance(begin, end); if (numElements == 1) return value == *begin; // pivot element auto middlePos = numElements >> 1; auto middle = begin; std::advance(middle, middlePos); // keep looking on the right or left side of pivot element ? if (value < *middle) return binary_search_recursive(begin, middle, value); else return binary_search_recursive(middle, end, value); }
As we have seen in the benchmark, it runs pretty fast on GCC 4.7 - and sucks with Visual C++ 2010.

Hybrid Approach

When the container holds small element types which are laid out next to each other in the memory (such as int), the L1/L2 caches' influence plays a major role.
My hybrid algorithm switches from binary subdivision to linear search after a given thresholdLinear was reached:
hide binary_search_hybrid /// linear search in inner loop template <size_t thresholdLinear, typename Iterator, typename Value> bool binary_search_hybrid(Iterator begin, Iterator end, Value value) { // narrow down range of candidates auto numElements = std::distance(begin, end); // repeat until only a few elements are left while (numElements > int(thresholdLinear)) { // pivot element auto middlePos = numElements >> 1; auto middle = begin; std::advance(middle, middlePos); if (*middle < value) { // value must be on the right side of pivot element begin = middle; ++begin; numElements = numElements - middlePos - 1; } else { // value must be on the left side of pivot element (or it is the pivot element) numElements = middlePos; } } // skip all smaller values while (begin != end && *begin < value) ++begin; // hit ? return begin != end && *begin == value; }
In contrast to the original algorithm, here the while compares against thresholdLinear, not 0. Lines 30-35 are new, everything else should be familiar.

The best threshold value heavily depends on the data type. For 32-bit ints I found thresholdLinear = 8 to be the best parameter.
Your mileage may vary.

The World Is Not A Vector

std::binary_search accepts arbitrary forward iterators and makes it easy to replace the container (benchmarks above: std::vector).
Here are the results of STL's double-ended queue (std::deque):
std::deque, 1 million int Visual C++ 2010 GCC 4.7
std::binary_search 352ms 206ms
my binary search 335ms 204ms
early check for equality 482ms 215ms
recursive 510ms 540ms
hybrid (8) 309ms 198ms
GCC wins all benchmarks by a huge margin. The hybrid algorithm fits well to the paged structure of a std::deque and utilizes the CPU caches best.

When looking at std::list we have to keep in mind that even though only a forward iterator is required, a lot of iterating is done in std::advance and std::distance.
Therefore instead of 1000000 only 10000 ints are pushed into an std::list:
std::list, ten thousand int Visual C++ 2010 GCC 4.7
std::binary_search 692ms 903ms
my binary search 683ms 878ms
early check for equality 687ms 882ms
recursive 982ms 1261ms
hybrid (8) 686ms 884ms
This time, Visual C++ wins all benchmarks over GCC. Maybe VC++ comes with a better memory allocator.
Unlike all tables before, I didn't highlight the fastest algorithm - because it's not shown in this table !
A simple std::find is faster than all presented algorithm when working on std::list. It can be improved for sorted data, too: abort when the iterator points at values larger than the value we are looking for. And here's the code of find and find_sorted:
hide find /// same algorithm as STL's find template <typename Iterator, typename Value> Iterator find(Iterator begin, Iterator end, Value value) { // compare all values until found while (begin != end) { if (*begin == value) return begin; // found ! ++begin; } // not found return end; }
hide find_sorted /// find, optimized for sorted data // requires: iff !(x<y) && !(x>y) then x == y template <typename Iterator, typename Value> Iterator find_sorted(Iterator begin, Iterator end, Value value) { // skip all smaller values while (begin != end && *begin < value) ++begin; // hit ? if (begin != end && *begin == value) return begin; else return end; }
Note: these two templates return an Iterator instead of bool (just like the original std::find).
std::list, 10'000 int Visual C++ 2010 GCC 4.7
find 253ms 292ms
find_sorted 163ms 184ms
That's 3 to 4 times faster than before. However, a small std::vector with only 10000 ints is still the clear winner:
std::vector, ten thousand int Visual C++ 2010 GCC 4.7
hybrid binary search 1.2ms 0.8ms
find 192ms 160ms
find_sorted 125ms 62ms
The same test with 1 million int in a std::list takes several minutes for find and find_sorted to complete.

The World Is Not A Simple Number

My benchmarks were completely memory-based and the objects only simple numbers (int or float). Performance was mainly influenced by instruction branching and caching.

If data is stored on high-latency storage, especially storage with slow seeking times like a hard disk or some network-based solution, then the advantage of an early equality test truly kicks in. The hybrid approach computes the most comparisons and suffers severely if comparisons are expensive (e.g. large objects instead of primitive data types such as int).

If your container is very small then find can be surprisingly efficient because it gets support from modern CPU's intelligent prefetching. For less than approx. 50 ints, find_sorted is as fast as a binary search and for less than approx. 25 ints, even find becomes competitive. The simple std::find has a few advantages, too: no need for the less-than operator and - more importantly - the input data doesn't have to be sorted.

Conclusion

Download my code and test with your own data. Then decide on your own what fits best to your needs !
homepage