Sorting STL Containers

posted by Stephan Brumme, updated


Yes, there is std::sort. Yes, it's pretty much the most efficient way to sort generic STL containers. No, it's not perfect.

Improving std::sort

What if the data is already almost sorted ? std::sort doesn't benefit a lot from processing already sorted data. However, it's quite common to have (partially) sorted data.

std::sort requires RandomAccessIterator which are sometimes not available for custom containers.

What about std::list ? The syntax is completely different:
hide std::list<int> mylist; // ... fill mylist with data ... // compiler error: std::sort(mylist.begin(), mylist.end()); // must use this instead: mylist.sort();
That's why I wrote some sorting routines on my own to overcome all those limitations: The sorting algorithms are ordered more or less by overall performance (slowest first). As you can see faster algorithms have stricter requirements:
Algorithm Stable ? Iterator Notes
Bubble Sort yes forward iterator code shown here requires bidirectional iterator, just remove both instances of --last; (will be slower then)
Selection Sort no forward iterator
Insertion Sort yes bidirectional iterator
Shell Sort no bidirectional iterator std::distance and std::advance benefit heavily from random access iterators
Heap Sort no random-access iterator my n-ary Heap Sort implementation works with bidirectional iterators, too
Merge Sort yes random-access iterator requires additional memory
Merge Sort (in-place) yes forward iterator no additional memory but slower than normal merge sort
Quick Sort no bidirectional iterator std::distance and std::advance benefit heavily from random access iterators
std::sort no random-access iterator
std::stable_sort yes random-access iterator
The algorithms are nothing new, they can be found in every decent book on algorithms and, of course, online (e.g. Wikipedia).

(read more ...)

Portable Memory Mapping C++ Class

posted by Stephan Brumme

Parsing Files The Easy Way

Recently I had to do a lot with file loading and parsing. It can be especially tricky to come up with a fast solution if you have to jump around within these files. Seeking and proper buffering were my biggest problems.

Memory mapping is one of the nicest features of modern operating systems: after opening a file in memory-mapped mode you can treat the file as a large chunk of memory and use plain pointers. The operating system takes care of loading the data on demand (!) into memory - utilizing caches, of course. When using my C++ class MemoryMapped it's really easy:
hide How to use // open file MemoryMapped data("myfile.txt"); // read byte at file offset 2345 unsigned char a = data[2345]; // or if you prefer pointers const short* raw = (const short*) data.getData(); short b = raw[300];
Windows is completely different from Linux when it comes to opening a file in memory-mapped mode. The class MemoryMapped hides all the OS specific stuff in only two files: MemoryMapped.h and MemoryMapped.cpp.
They compile without any warning with GCC 4.7 and Visual C++ 2010. I haven't tried other compilers but they should be able to handle it, too, even when they are a bit older.
(read more ...)

Testing Endian Aware Code

posted by Stephan Brumme

Turning Around

CPUs store numbers in two different ways: little or big endian. The majority of CPUs are designed to access memory in Little Endian way which seems to have the bytes mixed the wrong way. The memory layout of the hexadecimal number 0x12345678 looks like this:
byte 0 byte 1 byte 2 byte 3 Typical CPUs
Little Endian 0x78 0x56 0x34 0x12 Intel/AMD x86, x64
Big Endian 0x12 0x34 0x56 0x78 PowerPC, ARM*
Note: ARM chips often support little endian, too.

Virtual Machines / QEMU

I don't have a big endian computer at home (my ARM chips are little endian by default) but some of my source code required testing that everything works as expected on big endian machines, too. That's when I learnt about QEMU, a great open-source processor emulator. It's available on Unix and Windows.

This posting explains how to setup QEMU and run Debian Wheezy on an emulated big endian PowerPC.
(read more ...)

Software License

posted by Stephan Brumme

I received many kind requests from companies willing to use code from in commercial products.
I publish my code because I believe that sharing knowledge makes this world a better place. Therefore all code will be under a zlib-style license now.
This applies to all past and future postings.

The full license can be found on In short, you can do with it whatever you want, even build commercial or military software, but please keep my name in the source code (however, no need to display it in the shipped product to the end user).
Each file's header contains a reference to the license, too:
// ////////////////////////////////////////////////////////// // Copyright (c) 2013 Stephan Brumme. All rights reserved. // see //
It would be great if you publish your code, too, but that's not a requirement.

I appreciate if you send me an email with a short description of your project / product.

Read-Only Git Access Available

posted by Stephan Brumme

What Is Git ?!

If you don't know git grab one of tutorials on the web.

Using My Git Repositories

I added a git repository to these projects on that already came with a big yellow download button, e.g.:

git clone

You have read-only access via http and any push will produce an error message.
Whenever I update my projects (bugfixes, enhancements, ...), you can get the latest version:
# update local repository git pull
Unfortunately, not all repositories start with "version 1" because I lost some of the early source revisions.

How To Add Git Repositories To Your Website

The workflow is actually pretty simple.
# go to the project directory cd crc32 # create repository, will be stored in .git sub-directory git init # prepare files for the repository git add Crc32.cpp git add Crc32.ino git add Crc32Best.ino # of course you can use wild-cards, too # commit these added files git commit -m "version 1" # add web server's URL git remote add origin # compress repository git gc --aggressive
Now you can use your good old FTP program to copy the full .git sub-directory to your web server - just like you would copy HTML files. Done !

Note: A more sophisticated approach is to clone the repository from your local machine to your web server. However, if you are on a shared hoster then chances are good that git is disabled. Therefore I suggest you stick with FTP.

Fowler-Noll-Vo Hash (FNV1a)

posted by Stephan Brumme

At The Speed Of Light

Four weeks ago I posted an article about the popular CRC32 hashing algorithm. CRC32 works best on large data blocks because short sequences might lead to an increased number of collisions.

The FNV hash created by Fowler, Noll and Vo (see their website) comes with zero memory overhead (no precomputed look-up tables), is an incremental (rolling) hash and has a good avalanche behavior. It needs at about 4.5 CPU cycles per hashed byte on my computer, that's a little bit more than 700 MByte/second. GCC and Visual C++ produce identical code.

The heart of the hash function can be expressed in a single line (FNV1a):
newHash = (oneByte ^ oldHash) * Prime;
Reversing the order of XOR and multiplication leads to the original FNV1 hash which is considered slightly inferior:
newHash = (oneByte * Prime) ^ oldHash;
The 32-bit FNV1a hash of an unsigned char looks as follows in C/C++:
hide fnv1a, one byte // default values recommended by const uint32_t Prime = 0x01000193; // 16777619 const uint32_t Seed = 0x811C9DC5; // 2166136261 /// hash a single byte inline uint32_t fnv1a(unsigned char oneByte, uint32_t hash = Seed) { return (oneByte ^ hash) * Prime; }
And an array of bytes just calls fnv1a for each element:
hide fnv1a, multiple bytes /// hash a block of memory uint32_t fnv1a(const void* data, size_t numBytes, uint32_t hash = Seed) { assert(data); const unsigned char* ptr = (const unsigned char*)data; while (numBytes--) hash = fnv1a(*ptr++, hash); return hash; }

(read more ...)

Drawing Antialiased Circles and Ellipses

posted by Stephan Brumme

Fast But Ugly

PHP's GD image library can draw all kinds of circles and ellipses. But they look rather ugly because they lack proper antialiasing:


Aside from GD image handling stuff, it's a single line of code:
imageellipse($img, $width/2,$height/2, $width,$height, $color);
The image above was created by the following PHP file (click show):
show PHP's built-in imageellipse <? $start = microtime(true); // default size $width = 150; $height = 100; // ... or user defined if (is_numeric($_REQUEST["width" ])) $width = $_REQUEST["width" ]; if (is_numeric($_REQUEST["height"])) $height = $_REQUEST["height"]; // create new image $img = imagecreatetruecolor($width, $height); // transparent background $background = imagecolorallocate($img, 0xFF, 0xFF, 0xFF); imagefilledrectangle ($img, 0,0, $width,$height, $background); imagecolortransparent($img, $background); // draw red ellipse, 2*10px border $red = imagecolorallocate($img, 0xFF, 0x00, 0x00); imageellipse($img, $width/2,$height/2, $width-20,$height-20, $red); // measure speed $end = microtime(true); $duration = number_format(($end - $start)*1000, 3)."ms"; imagestring($img, 1, $width-50,$height-10, $duration, 0); // send PNG to browser header("Content-type: image/png"); imagepng($img, NULL, 9, PNG_ALL_FILTERS); ?>
This article will show five ways how to add antialiasing to circles/ellipses:
Algorithm Quality Speed Code Size
imageellipse bad very fast one line
Zooming Trick okay very slow about 10 lines
Polygon Approximation depends fast about 10 lines
Implicit Equation good very slow about 20 lines
Wu's Algorithm good slow about 30 lines
True-Type Font depends fast about 10 lines
All images on this page are drawn on-the-fly. Rendering time (in milliseconds) is displayed in the lower right corner and will be different if you reload.
(read more ...)

Uncompressing DEFLATE Data

posted by Stephan Brumme

The Most Common Compression Algorithm

Everyone knows the widespread ZIP file format. Some Windows users may have never heard of GZIP, but all Unix users did. 99% percent of all ZIPs are encoded with the so-called DEFLATE algorithm, specified as RFC 1951. The same holds true for PNG images, some PDFs, compressed HTTP data transfer and many more.

I wanted to have a C++ implementation without strange dependencies and weird pointer arithmentic. I wanted something like this:
#include "Inflate.h" std::vector<unsigned char> uncompressed = Inflate("filename.gz");
So I wrote my own DEFLATE decoder in pure C++/STL. It's reasonably fast - somewhere between 50 and 100 MByte/s (uncompressed data) - and compiles easily with Visual C++ and GCC (older version should be fine, too).
Latest release: November 27, 2011, size: 25.2 kBytes

CRC32: 7caba0b2
MD5: 079bbc381743a0f64e29908548bfeed5
SHA1: 71a7be05c51bda6026184980580a775a9d5e8ef6

If you encounter any bugs/problems or have ideas for improving future versions, please write me an email:
(read more ...)

Streaming MPEG-4 Movies From Your Own Server

posted by Stephan Brumme

No Need For YouTube

YouTube is great. Really ! You can find everything you want to watch - and probably everything you don't want to.

On my website are the best photos and videos I took while travelling all over world. Everybody from all over the world can watch them - that's intended and obviously the way the internet should work. Nevertheless, these are MY photos and MY videos and I want to be able to remove them if necessary. That's why I don't upload them to or

There is plenty of free space on my (virtual) web server and my contract includes unlimited traffic. Even more important, my server has a better internet connection than I have at home: my 25 MBit/s VDSL connection is 100% saturated when downloading from my server. I'm not sure but I assume my server can easily achieve 100 MBit/s (it's Germany's biggest hoster, by the way). YouTube is much slower and sometimes doesn't even exceed 3-4 MBit/s, especially after 6pm.

HTML5's Video Capabilities

The upcoming HTML5 standard introduced a new <video> tag. All modern browsers (Chrome, Opera, Firefox, Internet Explorer 9+ and even Android/iPhones) can play videos embedded in a website natively without resorting to Flash. Well, almost ...
hide HMTL's 5 video tag <video width="400" height="300" controls preload="metadata" poster=""> <source src="" type="video/mp4" /> </video>
One of the most efficient video codecs is MPEG-4, also known as AVC or H.264. Most high-definition (HD) stuff is based on MPEG-4, e.g. Blu-Rays. YouTube's HD videos are encoded in the MPEG-4 file format, too.
(read more ...)

Binary Search Revisited

posted by Stephan Brumme

Same same but different ?!

Some time ago I had to write code for a binary search of sorted data. Obviously there are libraries which can do the heavy lifting for me, such as C++'s STL.

When I looked up the source code of STL's binary_search I was surprised to find that it slightly differs from what I expected: it always performs log2(N) less-than comparisons and only after that checks for equality exactly once:
hide binary_search / pseudo code (STL) while at least one element { find element in the middle if it's smaller than x strip all elements before and including middle element else strip all elements after middle element } return true if remaining element matches x, else false
That means if it finds a match (something that equals what we are searching for) it still keeps going until subdivision finishes - and forfeits the chance for an early exit.
The STL guys usually prefer fast algorithms - why did they do it ?

This clearly calls for some serious googling ... and the vast majority of web sites, including Wikipedia, combine the less-than comparison with an check for equality and indeed allow for an early exit if they find a match:
hide binary_search / pseudo code (most common algorithm) while at least one element { find element in the middle if it matches x, return true <== NEW if it's smaller than x strip all elements before and including middle element else strip all elements after middle element } return false <== DIFFERENT
In Big-O notation: if we have a container with N elements, the following number of comparisons are expected on average:
check if less than check if equal
STL ≈ log2(N) 1
"rest of world" ≈ log2(N) - 1 ≈ log2(N) - 1
Is STL right ? Does saving all these checks for equality at the cost of one more less-than give a real-world performance boost ?
There is only one way to find the best binary_search: WE NEED BENCHMARKS !

(read more ...)

browse through all postings:
newer stuff:

2016/Aug/17   smalLZ4 - optimal LZ4 compression
2016/Aug/04   xxHash - a hash algorithm as fast as memcpy
2015/Aug/11   Tiny Regular Expression Matcher
2015/Feb/04   Fast CRC32
2014/Aug/25   Practical String Searching
2014/Aug/19   The Mersenne Twister Pseudo Random Number Generator
2014/Jul/11   Parallel Sieve of Eratosthenes
2014/Jun/23   Windows and Linux Code Timing
2014/Jun/14   Portable C++ Hashing Library

older stuff:

2011/Oct/20   Reverse MD5 Hash
2011/Oct/17   Windows 7 Screenshots
2011/Oct/16   Hide Strings In Your Executable
2011/Oct/11   Generate QR Codes With PHP and Google Charts API
2011/Oct/02   Feed.rss and Sitemap.xml
2011/Sep/28   My Experiences with AdSense
2011/Sep/27   Emulating MemCache in Pure PHP
2011/Sep/14   Simple Ajax
2011/Sep/13   Sparklines
2011/Sep/12   smallpt - Global Illumination in Javascript
2011/Sep/11   Multi-Core Sorting
2011/Sep/06   CLein - The tiny, portable OpenCL info tool