posted by Stephan Brumme
The old-fashioned wayMost traditional sorting algorithms (I already blogged about their STL-compatible versions) work in a serial manner and are perfectly suited for single-core machines.
My current computer has a Intel Core i7 860 CPU which comes with 4 cores and hyperthreading, giving me a total of 8 logical cores. That means, CPU utilization with traditional sorting algorithms (i.e. only able to run on one 1 out of 8 cores) cannot exceed 12.5%. That sucks.
OpenMPLuckily, the GCC team made parallelized versions of pretty all important STL algorithms available. They still call their implementation experimental but I haven't had any problems so far.
Everything is based on OpenMP which has been supported since GCC 4.2. Visual C++ and Intel C++ support OpenMP as well but I haven't tested GCC's library with these commercial compilers yet.
How to use itThere are two ways to switch to GCC's parallel mode:
- link against
- use the
// g++ -O3 -fopenmp -march=native parallelSort.cpp -o parallelSort #include <algorithm> std::sort(data.begin(), data.end()); #include <parallel/algorithm> __gnu_parallel::sort(data.begin(), data.end());
How NOT to use itIf you are sorting only a small set of elements, the overhead for running the parallel code might outweight the time savings.
In my experience, if
std::sorttakes less than 5ms, then don't expect any significant gains from
And if you have lots of threads/processes already pushing your CPU to its limits, then you might prefer to stick to your old, trusted code.
Remember, everything is still labeled as experimental by GCC (early September 2011).
DownloadGit users: scroll down to the repository link
Stay up-to-date:git clone http://create.stephan-brumme.com/parallelSort/.git
If you encounter any bugs/problems or have ideas for improving future versions, please write me an email: firstname.lastname@example.org
- April 24, 2013
- zlib-style license, applies to all past and future versions
- version 3
- November 9, 2011
- fixed spelling
- version 2
- September 22, 2011
- code formatting
- version 1
- September 11, 2011
- initial release
BenchmarkFor reasons unknown, OpenMP was removed from my web server's GCC. It's shared hosting so I can't do anything about it.
Therefore, instead of a live demo I just show you the numbers of my desktop computer (Intel Core i7 860, GCC 4.6.0):
|100 million integers||sorted||inverted||random|
#define USE_WINDOWS_TIMERto get more precise timings.
std::sortuses core 1 and 7 with all other cores idling. Once
__gnu_parallel::sortstarts, all cores are under full load.
There is a noteworthy side-effect: my CPU's default speed is 2.8GHz but it spends 99% of the time in power-saving mode at 1.2GHz. Turbo-Boost overclocks the CPU to 3.3GHz with
std::sortbut falls back to only 2.9GHz with
__gnu_parallel::sort. I'm still impressed with Turbo Boost because it's about 28 Celsius (82 Fahrenheit) outside and, unfortunately, inside my appartment.