Parallel Sieve of Eratosthenes [updated]
posted by Stephan Brumme,
updated
Processing 1 Billion Numbers Per Second
There are 50,847,534 prime numbers between 2 and 1,000,000,000. For reasons unknown, I wanted to know whether a C++ implementation can find all these numbers on my desktop computer (Intel Core i7 860, quad-core, hyperthreading, 2.8GHz) in less than 1 second with a simple algorithm such as the Sieve of Eratosthenes.By only slightly modifying the basic algorithm and unleashing the raw power of OpenMP, I achieved a
UPDATE October 6, 2011: After adding a simple test for divisibility by 3 (and 5, 7, 9, 11, 13), the NEW block-wise sieve gets much faster and swiftly breaks through the 1-second-barrier.
UPDATE July 7, 2014: Many browsers support asm.js to boost Javascript performance. Scroll down to find my asm.js implementation or click here.
Algorithm | OpenMP | from 2 to 1 billion | Speed-Up | CPU ticks (approx.) | Notes | |
---|---|---|---|---|---|---|
Basic Algorithm | no | 50847534 primes | 13.5 seconds | (1.0x) | 40 billion | shortest code (22 lines) |
Odd Numbers Only | no | 50847534 primes | 6.7 seconds | 2.0x | 20 billion | only 10% more code than basic algorithm |
Odd Numbers Only | yes, 8 threads | 50847534 primes | 5.4 seconds | 2.5x | 130 billion | |
Block-Wise NEW | no | 50847534 primes | 2.4 seconds | 5.6x | 7 billion | most efficient |
Block-Wise NEW | yes, 8 threads | 50847534 primes | 0.65 seconds | 20.7x | 16 billion | fastest |
Live Demo
I ported the short Odd Numbers Only algorithm to Javascript and PHP; the code can be seen at the bottom of this page. The table below shows the expected number of primes for some reference inputs and the performance numbers measured on my computer.In this benchmark, C++ is approx. 4x faster than the best Javascript engine (Mozilla Firefox) and approx. 100x faster than PHP 5.3. Only the PHP times marked with (*) are measured on my local computer while all other PHP timings were measured on the create.stephan-brumme.com server (which is about 25% bit slower).
All browsers are still 32 bit programs - while my operating system is already 64 bit - and run out of memory when attempting to find all prime numbers from 1 to 1 billion. The more complex blockwise sieve does not suffer from this problem but I was too lazy to port it, too.
Multithreading is currently not supported by neither Javascript nor PHP.
from 2 to ... | primes | C++ | Internet Explorer 10 | Firefox 18 | Opera 12 | Chrome 24 | PHP 5.3 |
---|---|---|---|---|---|---|---|
10 | 4 | <0.001s | <0.001s | <0.001s | <0.001s | <0.001s | <0.001s |
100 | 25 | <0.001s | <0.001s | <0.001s | <0.001s | <0.001s | <0.001s |
1,000 | 168 | <0.001s | <0.001s | <0.001s | <0.001s | <0.001s | <0.001s |
10,000 | 1,229 | <0.001s | <0.001s | <0.001s | <0.001s | <0.001s | 0.003s |
100,000 | 9,592 | <0.001s | 0.001s | 0.001s | 0.002s | 0.001s | 0.027s |
1,000,000 | 78,498 | 0.002s | 0.010s | 0.007s | 0.022s | 0.015s | 0.393s |
10,000,000 | 664,579 | 0.024s | 0.139s | 0.103s | 0.248s | 0.190s | 3.717s |
100,000,000 | 5,761,455 | 0.555s | about 1.7s | about 1.2s | about 2.7s | about 2.0s | about 29s (*) |
1,000,000,000 | 50,847,534 | 6.7s | n/a | n/a | n/a | n/a | n/a |
UPDATE February 14, 2013: More than a year after my initial tests, Firefox became the fastest browser due to its vastly improved Javascript engine.
(read more ...)
Windows and Linux Code Timing
posted by Stephan Brumme
Whenever I have to do some performance testing, I try the most simple way: write a short program and run it within my Linux bash shell which comes with a built-in
time
command:
$ time ./testcode
real 0m0.788s
user 0m0.696s
sys 0m0.080s
The timing resolution isn't that great, either, and often lies above 10ms.
That's why I wrote a simple C routine to measure performance inside the code. My goal was to have numbers that are accurate down to about 1 microsecond (10-6).
I didn't opt for extremely high timing precision (count CPU ticks) because I rarely need that degree of precision and really want to avoid the tricky details of CPU frequency scaling, cache stalls etc.
Another feature is portability; simply recompile the code and run it on Windows and Linux. The code works fine with GCC as well as Visual C++.
All my timing code contains is a single function
seconds()
returning a double
.
You can invoke it in your programs like this:
double start = seconds();
// ... your code that should be measured ...
double duration = seconds() - start;
(read more ...)
Portable C++ Hashing Library
posted by Stephan Brumme,
updated
Introduction
Let's start with the core features of my C++ hashing library:- computes CRC32, MD5, SHA1 and SHA256 (most common member of the SHA2 functions), Keccak and its SHA3 sibling
- optional HMAC (keyed-hash message authentication code)
- no external dependencies, small code size
- can work chunk-wise (for example when reading streams block-by-block)
- portable: supports Windows and Linux, tested on Little Endian and Big Endian CPUs
- roughly as fast as Linux core hashing functions
- open source, zlib license
Usage
Since there are no external dependencies, not even dependencies within the library itself, all you need is to include the header of the hashing algorithm of your choice and add the identically named.cpp
to your project.
That means if you want to add SHA256 hashing to your C++ program, you only have to include
sha256.h
and sha256.cpp
in your project.If you prefer SHA256, then your code will look like this:
hide
// SHA2 test program
#include "sha256.h"
#include <iostream> // for std::cout only, not needed for hashing library
int main(int, char**)
{
// create a new hashing object
SHA256 sha256;
// hashing an std::string
std::cout << sha256("Hello World") << std::endl;
// => a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e
// hashing a buffer of bytes
const char* buffer = "How are you";
std::cout << sha256(buffer, 11) << std::endl;
// => 9c7d5b046878838da72e40ceb3179580958df544b240869b80d0275cc07209cc
// or in a streaming fashion (re-use "How are you")
SHA256 sha256stream;
const char* url = "create.stephan-brumme.com"; // 25 bytes
int step = 5;
for (int i = 0; i < 25; i += step)
sha256stream.add(url + i, step); // add five bytes at a time
std::cout << sha256stream.getHash() << std::endl;
// => 82aa771f1183c52f973c798c9243a1c73833ea40961c73e55e12430ec77b69f6
return 0;
}
(read more ...)
Sorting STL Containers
posted by Stephan Brumme,
updated
std::sort
Yes, there isstd::sort
.
Yes, it's pretty much the most efficient way to sort generic STL containers.
No, it's not perfect.
Improving std::sort
What if the data is already almost sorted ?std::sort
doesn't benefit a lot from processing already sorted data.
However, it's quite common to have (partially) sorted data.std::sort
requires RandomAccessIterator
which are sometimes not available for custom containers.What about
std::list
? The syntax is completely different:
hide
That's why I wrote some sorting routines on my own to overcome all those limitations:
std::list<int> mylist;
// ... fill mylist with data ...
// compiler error:
std::sort(mylist.begin(), mylist.end());
// must use this instead:
mylist.sort();
- can sort containers that have only basic
ForwardIterator
(Bubble Sort and Selection Sort) - most routines work with
BidirectionalIterator
- therefore seamless integration of
std::list
(and all other STL containers)
- data types need a less-than comparison (
operator<
); which can be user-defined, too
Algorithm | Stable ? | Iterator | Notes |
---|---|---|---|
Bubble Sort | yes | forward iterator | code shown here requires bidirectional iterator, just remove both instances of --last; (will be slower then) |
Selection Sort | no | forward iterator | |
Insertion Sort | yes | bidirectional iterator | |
Shell Sort | no | bidirectional iterator | std::distance and std::advance benefit heavily from random access iterators |
Heap Sort | no | random-access iterator | my n-ary Heap Sort implementation works with bidirectional iterators, too |
Merge Sort | yes | random-access iterator | requires additional memory |
Merge Sort (in-place) | yes | forward iterator | no additional memory but slower than normal merge sort |
Quick Sort | no | bidirectional iterator | std::distance and std::advance benefit heavily from random access iterators |
Intro Sort | no | bidirectional iterator | std::distance and std::advance benefit heavily from random access iterators |
std::sort | no | random-access iterator | |
std::stable_sort | yes | random-access iterator |
(read more ...)
Portable Memory Mapping C++ Class
posted by Stephan Brumme
Parsing Files The Easy Way
Recently I had to do a lot with file loading and parsing. It can be especially tricky to come up with a fast solution if you have to jump around within these files. Seeking and proper buffering were my biggest problems.Memory mapping is one of the nicest features of modern operating systems: after opening a file in memory-mapped mode you can treat the file as a large chunk of memory and use plain pointers. The operating system takes care of loading the data on demand (!) into memory - utilizing caches, of course. When using my C++ class
MemoryMapped
it's really easy:
hide
How to use
Windows is completely different from Linux when it comes to opening a file in memory-mapped mode.
The class
// open file
MemoryMapped data("myfile.txt");
// read byte at file offset 2345
unsigned char a = data[2345];
// or if you prefer pointers
const short* raw = (const short*) data.getData();
short b = raw[300];
MemoryMapped
hides all the OS specific stuff in only two files:
MemoryMapped.h
and MemoryMapped.cpp
.They compile without any warning with GCC 4.7 and Visual C++ 2010. I haven't tried other compilers but they should be able to handle it, too, even when they are a bit older.
(read more ...)
Testing Endian Aware Code
posted by Stephan Brumme
Turning Around
CPUs store numbers in two different ways: little or big endian. The majority of CPUs are designed to access memory in Little Endian way which seems to have the bytes mixed the wrong way. The memory layout of the hexadecimal number 0x12345678 looks like this:byte 0 | byte 1 | byte 2 | byte 3 | Typical CPUs | |
---|---|---|---|---|---|
Little Endian | 0x78 | 0x56 | 0x34 | 0x12 | Intel/AMD x86, x64 |
Big Endian | 0x12 | 0x34 | 0x56 | 0x78 | PowerPC, ARM* |
Virtual Machines / QEMU
I don't have a big endian computer at home (my ARM chips are little endian by default) but some of my source code required testing that everything works as expected on big endian machines, too. That's when I learnt about QEMU, a great open-source processor emulator. It's available on Unix and Windows.This posting explains how to setup QEMU and run Debian Wheezy on an emulated big endian PowerPC.
(read more ...)
Software License
posted by Stephan Brumme
I received many kind requests from companies willing to use code from create.stephan-brumme.com in commercial products.
I publish my code because I believe that sharing knowledge makes this world a better place. Therefore all code will be under a zlib-style license now.
This applies to all past and future postings.
The full license can be found on http://create.stephan-brumme.com/disclaimer.html. In short, you can do with it whatever you want, even build commercial or military software, but please keep my name in the source code (however, no need to display it in the shipped product to the end user).
Each file's header contains a reference to the license, too:
// //////////////////////////////////////////////////////////
// Copyright (c) 2013 Stephan Brumme. All rights reserved.
// see http://create.stephan-brumme.com/disclaimer.html
//
I appreciate if you send me an email with a short description of your project / product.
Read-Only Git Access Available
posted by Stephan Brumme
What Is Git ?!
If you don't knowgit
grab one of tutorials on the web.
Using My Git Repositories
I added a git repository to these projects on create.stephan-brumme.com that already came with a big yellow download button, e.g.:git clone http://create.stephan-brumme.com/crc32/.git
You have read-only access via http and any push will produce an error message.
Whenever I update my projects (bugfixes, enhancements, ...), you can get the latest version:
# update local repository
git pull
How To Add Git Repositories To Your Website
The workflow is actually pretty simple.
# go to the project directory
cd crc32
# create repository, will be stored in .git sub-directory
git init
# prepare files for the repository
git add Crc32.cpp
git add Crc32.ino
git add Crc32Best.ino
# of course you can use wild-cards, too
# commit these added files
git commit -m "version 1"
# add web server's URL
git remote add origin http://create.stephan-brumme.com/crc32/.git
# compress repository
git gc --aggressive
.git
sub-directory to your web server -
just like you would copy HTML files. Done !Note: A more sophisticated approach is to clone the repository from your local machine to your web server. However, if you are on a shared hoster then chances are good that
git
is disabled. Therefore I suggest you stick with FTP.
Fowler-Noll-Vo Hash (FNV1a)
posted by Stephan Brumme
At The Speed Of Light
Four weeks ago I posted an article about the popular CRC32 hashing algorithm. CRC32 works best on large data blocks because short sequences might lead to an increased number of collisions.The FNV hash created by Fowler, Noll and Vo (see their website) comes with zero memory overhead (no precomputed look-up tables), is an incremental (rolling) hash and has a good avalanche behavior. It needs at about 4.5 CPU cycles per hashed byte on my computer, that's a little bit more than 700 MByte/second. GCC and Visual C++ produce identical code.
The heart of the hash function can be expressed in a single line (FNV1a):
newHash = (oneByte ^ oldHash) * Prime;
XOR
and multiplication leads to the original FNV1 hash which is considered slightly inferior:
newHash = (oneByte * Prime) ^ oldHash;
unsigned char
looks as follows in C/C++:
hide
fnv1a, one byte
And an array of bytes just calls
// default values recommended by http://isthe.com/chongo/tech/comp/fnv/
const uint32_t Prime = 0x01000193; // 16777619
const uint32_t Seed = 0x811C9DC5; // 2166136261
/// hash a single byte
inline uint32_t fnv1a(unsigned char oneByte, uint32_t hash = Seed)
{
return (oneByte ^ hash) * Prime;
}
fnv1a
for each element:
hide
fnv1a, multiple bytes
/// hash a block of memory
uint32_t fnv1a(const void* data, size_t numBytes, uint32_t hash = Seed)
{
assert(data);
const unsigned char* ptr = (const unsigned char*)data;
while (numBytes--)
hash = fnv1a(*ptr++, hash);
return hash;
}
(read more ...)
Drawing Antialiased Circles and Ellipses
posted by Stephan Brumme
Fast But Ugly
PHP's GD image library can draw all kinds of circles and ellipses. But they look rather ugly because they lack proper antialiasing:Aside from GD image handling stuff, it's a single line of code:
imageellipse($img, $width/2,$height/2, $width,$height, $color);
show
PHP's built-in imageellipse
This article will show five ways how to add antialiasing to circles/ellipses:
<?php
$start = microtime(true);
// default size
$width = 150;
$height = 100;
// ... or user defined
if (is_numeric($_REQUEST["width" ])) $width = $_REQUEST["width" ];
if (is_numeric($_REQUEST["height"])) $height = $_REQUEST["height"];
// create new image
$img = imagecreatetruecolor($width, $height);
// transparent background
$background = imagecolorallocate($img, 0xFF, 0xFF, 0xFF);
imagefilledrectangle ($img, 0,0, $width,$height, $background);
imagecolortransparent($img, $background);
// draw red ellipse, 2*10px border
$red = imagecolorallocate($img, 0xFF, 0x00, 0x00);
imageellipse($img, $width/2,$height/2, $width-20,$height-20, $red);
// measure speed
$end = microtime(true);
$duration = number_format(($end - $start)*1000, 3)."ms";
imagestring($img, 1, $width-50,$height-10, $duration, 0);
// send PNG to browser
header("Content-type: image/png");
imagepng($img, NULL, 9, PNG_ALL_FILTERS);
?>
Algorithm | Quality | Speed | Code Size |
---|---|---|---|
imageellipse |
bad | very fast | one line |
Zooming Trick | okay | very slow | about 10 lines |
Polygon Approximation | depends | fast | about 10 lines |
Implicit Equation | good | very slow | about 20 lines |
Wu's Algorithm | good | slow | about 30 lines |
True-Type Font | depends | fast | about 10 lines |
(read more ...)
browse through all postings:
newer stuff:2021/Jun/30 Length-Limited Prefix Codes
2020/Apr/13 smalLZ4 - optimal LZ4 compression
2019/Jun/17 toojpeg - a JPEG encoder in a single C++ file
2018/Oct/15 flexiGIF - lossless GIF/LZW optimization
2016/Aug/04 xxHash - a hash algorithm as fast as memcpy
2015/Aug/11 Tiny Regular Expression Matcher
2015/Feb/04 Fast CRC32
2014/Aug/25 Practical String Searching
2014/Aug/19 The Mersenne Twister Pseudo Random Number Generator
older stuff:
2011/Nov/27 Uncompressing DEFLATE Data
2011/Nov/23 Streaming MPEG-4 Movies From Your Own Server
2011/Nov/08 Binary Search Revisited
2011/Oct/20 Reverse MD5 Hash
2011/Oct/17 Windows 7 Screenshots
2011/Oct/16 Hide Strings In Your Executable
2011/Oct/11 Generate QR Codes With PHP and Google Charts API
2011/Oct/02 Feed.rss and Sitemap.xml
2011/Sep/28 My Experiences with AdSense
2011/Sep/27 Emulating MemCache in Pure PHP
2011/Sep/14 Simple Ajax
2011/Sep/13 Sparklines
2011/Sep/12 smallpt - Global Illumination in Javascript
2011/Sep/11 Multi-Core Sorting
2011/Sep/06 CLein - The tiny, portable OpenCL info tool