# Binary Search Revisited

posted by Stephan Brumme

## Same same but different ?!

Some time ago I had to write code for a binary search of sorted data. Obviously there are libraries which can do the heavy lifting for me, such as C++'s STL.When I looked up the source code of STL's

`binary_search`

I was surprised to find that it
slightly differs from what I expected: it always performs log2(N) less-than comparisons and
only after that checks for equality exactly once:
hide

That means if it finds a match (something that equals what we are searching for)
it still keeps going until subdivision finishes - and forfeits the chance for an early exit.**binary_search / pseudo code (STL)**```
```**while** at least one element
{
find element in the middle
**if** it's smaller than x
strip all elements before and including middle element
**else**
strip all elements after middle element
}
**return** true **if** remaining element matches x, **else** false

The STL guys usually prefer fast algorithms - why did they do it ?

This clearly calls for some serious googling ... and the vast majority of web sites, including Wikipedia, combine the less-than comparison with an check for equality and indeed allow for an early exit if they find a match:

hide

In Big-O notation: if we have a container with N elements,
the following number of comparisons are expected on average:
**binary_search / pseudo code (most common algorithm)**```
```**while** at least one element
{
find element in the middle
**if** it matches x, **return** true <== NEW
**if** it's smaller than x
strip all elements before and including middle element
**else**
strip all elements after middle element
}
**return** false <== DIFFERENT

check if less than | check if equal | |
---|---|---|

STL | ≈ log_{2}(N) |
1 |

"rest of world" | ≈ log_{2}(N) - 1 |
≈ log_{2}(N) - 1 |

There is only one way to find the best

`binary_search`

: **WE NEED BENCHMARKS !**

## Download

Git users: scroll down for the repository link Latest release:**April 24, 2013**, size:

**10.2**kBytes,

**381**lines

CRC32:

`8ccb4820`

MD5:

`3e13d18e5c402b03dd670e18d11c8510`

SHA1:

`1dce5af123bf2f76c97bd1fd21c9c35d444dd32c`

SHA256:

`47efec0b9b5facca2a8ccf29d0b7a36972140ee61478f16de48054c57455e36b`

Stay up-to-date:git clone http://create.stephan-brumme.com/binary-search/.git

If you encounter any bugs/problems or have ideas for improving future versions, please write me an email: create@stephan-brumme.com

## Simplifying the Code

Visual C++'s and GCC's STL implementations of`binary_search`

rely on `lower_bound`

(I'm pretty sure all STL libraries do).
However, like most STL stuff the code is hardly readable.
After inlining `lower_bound`

, stripping off all the debugging stuff and
replacing the weird variables and type names by something more understandable,
`binary_search`

looks like this:
hide

I highlighted lines 16-19 to show where the check for equality could be inserted.**binary_search**```
```*/// same algorithm as STL*
**template** <**typename** Iterator, **typename** Value>
**bool** binary_search(Iterator begin, Iterator end, Value value)
{
*// narrow down range of candidates*
**auto** numElements = std::distance(begin, end);
*// repeat until a single element is left*
**while** (numElements > 0)
{
*// pivot element*
**auto** middlePos = numElements >> 1;
**auto** middle = begin;
std::advance(middle, middlePos);
*// >>> UPDATE check for equality*
*//if (value == *middle)*
*// return true;*
*// <<< UPDATE*
**if** (*middle < value)
{
*// value must be on the right side of pivot element*
begin = middle;
++begin;
numElements = numElements - middlePos - 1;
}
**else**
{
*// value must be on the left side of pivot element (or it is the pivot element)*
numElements = middlePos;
}
}
*// out of bounds ?*
**if** (begin == end)
**return** false;
*// >>> UPDATE return always false if checking for equality in while-loop*
*// return false;*
*// <<< UPDATE*
*// hit ?*
**return** value == *begin;
}

**Note:**you need a C++11 compiler because of the super-convenient

`auto`

keyword.
## Benchmark

The main benchmark parameters are the container type (`std::vector`

,
`std::deque`

or `std::list`

) and:
```
```*/// data type*
**typedef** **int** Number;
*/// number of elements*
**const** **size_t** numElements = 1000*1000;

```
```**for** (**size_t** i = 0; i < numElements; i++)
sorted.push_back(Number(i*step));

`step`

(and `notFound=step/2`

)
is responsible for creating a certain gap between neighboring numbers.
These gaps are helpful to get a better estimate of the algorithm's runtime because they allow for
looking for numbers that are NOT in the container in an evenly distributed manner.
index | 0 | 1 | 2 | 3 | 4 | 5 | ... |
---|---|---|---|---|---|---|---|

sorted[index] | 0 | 10 | 20 | 30 | 40 | 50 | ... |

`0`

(succeeds), `5`

(fails),
`10`

(succeeds), `15`

(fails),
`20`

(succeeds), `25`

(fails),
`30`

(succeeds), `35`

(fails), etc.:
hide

Searching 1000000 sorted numbers requires to repeat the inner **test routine**```
```**for** (**size_t** i = 0; i < numElements; i++)
{
*// hit*
Number succeed = Number(i*step);
**if** (!std::binary_search(sorted.begin(), sorted.end(), succeed))
std::cout << "error @" << succeed << std::endl;
*// miss*
Number fail = Number(i*step+notFound);
**if** ( std::binary_search(sorted.begin(), sorted.end(), fail))
std::cout << "error @" << fail << std::endl;
}

`while`

-loop about 20 times.
All compiler optimizations I could think of were enabled.That's what I get:

1 million `int` |
Visual C++ 2010 | GCC 4.7 |
---|---|---|

std::binary_search | 131ms | 122ms |

my binary search | 107ms | 96ms |

early check for equality | 160ms | 97ms |

recursive | 242ms | 110ms |

hybrid (8) | 85ms | 133ms |

`int`

egers (compiled with Visual C++).My implementation of

`std::binary_search`

isn't as nested as the STL and
lacks some iterator checking.For GCC it doesn't really matter which variant of my binary search implementations is chosen but Visual C++ seriously loses ground when enabling the early check for equality.

I added

`#define _SECURE_SCL 0`

to disable Visual C++'s checked iterators
in release mode, too.Let's have a look at

`float`

(very similar results for `double`

):
1 million `float` |
Visual C++ 2010 | GCC 4.7 |
---|---|---|

std::binary_search | 251ms | 119ms |

my binary search | 217ms | 96ms |

early check for equality | 200ms | 126ms |

recursive | 290ms | 129ms |

hybrid (16) | 158ms | 132ms |

## Even More Simplifying the Code

A recursive implementation of the binary search is quite short:
hide

As we have seen in the benchmark, it runs pretty fast on GCC 4.7 - and sucks with Visual C++ 2010.
**binary_search_recursive**```
```*/// recursive implementation*
**template** <**typename** Iterator, **typename** Value>
**bool** binary_search_recursive(Iterator begin, Iterator end, Value value)
{
*// empty ?*
**if** (begin == end)
**return** false;
*// stop recursion ?*
**auto** numElements = std::distance(begin, end);
**if** (numElements == 1)
**return** value == *begin;
*// pivot element*
**auto** middlePos = numElements >> 1;
**auto** middle = begin;
std::advance(middle, middlePos);
*// keep looking on the right or left side of pivot element ?*
**if** (value < *middle)
**return** binary_search_recursive(begin, middle, value);
**else**
**return** binary_search_recursive(middle, end, value);
}

## Hybrid Approach

When the container holds small element types which are laid out next to each other in the memory (such as`int`

), the L1/L2 caches' influence plays a major role.My hybrid algorithm switches from binary subdivision to linear search after a given

`thresholdLinear`

was reached:
hide

In contrast to the original algorithm, here the
**binary_search_hybrid**```
```*/// linear search in inner loop*
**template** <**size_t** thresholdLinear, **typename** Iterator, **typename** Value>
**bool** binary_search_hybrid(Iterator begin, Iterator end, Value value)
{
*// narrow down range of candidates*
**auto** numElements = std::distance(begin, end);
*// repeat until only a few elements are left*
**while** (numElements > **int**(thresholdLinear))
{
*// pivot element*
**auto** middlePos = numElements >> 1;
**auto** middle = begin;
std::advance(middle, middlePos);
**if** (*middle < value)
{
*// value must be on the right side of pivot element*
begin = middle;
++begin;
numElements = numElements - middlePos - 1;
}
**else**
{
*// value must be on the left side of pivot element (or it is the pivot element)*
numElements = middlePos;
}
}
*// skip all smaller values*
**while** (begin != end && *begin < value)
++begin;
*// hit ?*
**return** begin != end && *begin == value;
}

`while`

compares against `thresholdLinear`

,
not `0`

. Lines 30-35 are new, everything else should be familiar.The best threshold value heavily depends on the data type. For 32-bit

`int`

s I found `thresholdLinear`

=
`8`

to be the best parameter.Your mileage may vary.

```
my::binary_search_hybrid<8>(sorted.begin(), sorted.end(), lookFor));
```

## The World Is Not A Vector

`std::binary_search`

accepts arbitrary forward iterators and makes it
easy to replace the container (benchmarks above: `std::vector`

).Here are the results of STL's double-ended queue (

`std::deque`

):
`std::deque` , 1 million `int` |
Visual C++ 2010 | GCC 4.7 |
---|---|---|

std::binary_search | 352ms | 206ms |

my binary search | 335ms | 204ms |

early check for equality | 482ms | 215ms |

recursive | 510ms | 540ms |

hybrid (8) | 309ms | 198ms |

`std::deque`

and utilizes the CPU caches best.When looking at

`std::list`

we have to keep in mind that
even though only a forward iterator is required, a lot of iterating is done
in `std::advance`

and `std::distance`

.Therefore instead of 1000000 only 10000

`int`

s are pushed into an `std::list`

:
`std::list` , ten thousand `int` |
Visual C++ 2010 | GCC 4.7 |
---|---|---|

std::binary_search | 692ms | 903ms |

my binary search | 683ms | 878ms |

early check for equality | 687ms | 882ms |

recursive | 982ms | 1261ms |

hybrid (8) | 686ms | 884ms |

Unlike all tables before, I didn't highlight the fastest algorithm - because it's not shown in this table !

A simple

`std::find`

is faster than all presented algorithm when working
on `std::list`

.
It can be improved for sorted data, too: abort when the iterator points at values larger than
the value we are looking for.
And here's the code of `find`

and `find_sorted`

:
hide

**find**```
```*/// same algorithm as STL's find*
**template** <**typename** Iterator, **typename** Value>
Iterator find(Iterator begin, Iterator end, Value value)
{
*// compare all values until found*
**while** (begin != end)
{
**if** (*begin == value)
**return** begin; *// found !*
++begin;
}
*// not found*
**return** end;
}

hide

**find_sorted**```
```*/// find, optimized for sorted data*
*// requires: iff !(x<y) && !(x>y) then x == y*
**template** <**typename** Iterator, **typename** Value>
Iterator find_sorted(Iterator begin, Iterator end, Value value)
{
*// skip all smaller values*
**while** (begin != end && *begin < value)
++begin;
*// hit ?*
**if** (begin != end && *begin == value)
**return** begin;
**else**
**return** end;
}

**Note:**these two templates return an

`Iterator`

instead of
`bool`

(just like the original `std::find`

).
`std::list` , 10'000 `int` |
Visual C++ 2010 | GCC 4.7 |
---|---|---|

find | 253ms | 292ms |

find_sorted | 163ms | 184ms |

`std::vector`

with only 10000 `int`

s
is still the clear winner:
`std::vector` , ten thousand `int` |
Visual C++ 2010 | GCC 4.7 |
---|---|---|

hybrid binary search | 1.2ms | 0.8ms |

find | 192ms | 160ms |

find_sorted | 125ms | 62ms |

`int`

in a `std::list`

takes several minutes for `find`

and `find_sorted`

to complete.
## The World Is Not A Simple Number

My benchmarks were completely memory-based and the objects only simple numbers (`int`

or `float`

).
Performance was mainly influenced by instruction branching and caching.If data is stored on high-latency storage, especially storage with slow seeking times like a hard disk or some network-based solution, then the advantage of an early equality test truly kicks in. The hybrid approach computes the most comparisons and suffers severely if comparisons are expensive (e.g. large objects instead of primitive data types such as

`int`

).If your container is very small then

`find`

can be surprisingly efficient
because it gets support from modern CPU's intelligent prefetching.
For less than approx. 50 `int`

s, `find_sorted`

is as fast as a binary search and for less than approx. 25 `int`

s,
even `find`

becomes competitive.
The simple `std::find`

has a few advantages, too:
no need for the less-than operator and - more importantly - the input data doesn't have to be sorted.