Interactive demos require Javascript. Anything else works.

Portable C++ Hashing Library

posted by Stephan Brumme, updated

Introduction

Let's start with the core features of my C++ hashing library:

Usage

Since there are no external dependencies, not even dependencies within the library itself, all you need is to include the header of the hashing algorithm of your choice and add the identically named .cpp to your project. That means if you want to add SHA256 hashing to your C++ program, you only have to include sha256.h and sha256.cpp in your project.

If you prefer SHA256, then your code will look like this: You can download the latest version of each source file on its own but I strongly recommend fetching them from my GIT repository. It can be found below the last download link.
Download  hash-library.zip
Latest release: February 2, 2021, size: 60.9 kBytes

CRC32: 59f5a159
MD5: 52c0178ea4869f87e07496b8666e5acc
SHA1: 6ef06febecf46cd59477189fe69de73c5ace89e2
SHA256:4f7722da48efc432fd15bf436d312805f5335498a3087153a8d229bb7fa9406f

If you encounter any bugs/problems or have ideas for improving future versions, please write me an email: create@stephan-brumme.com

License

This code is licensed under the zlib License:
This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software. Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions: 1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. 2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software. 3. This notice may not be removed or altered from any source distribution.zlib License

Changelog

Interface

All implemented hashing algorithms (CRC32, MD5, SHA1, SHA256 and SHA3/Keccak) share the same public interface: The base class Hash was introduced in version 3 and is optional - in fact it's disabled by default.
To enable it, open crc32.h, md5.h, sha1.h, sha256.h, keccak.h and sha3.h, remove the slashes in front of #include "hash.h" (line 9) and derive from public Hash (about line 37).

Containing just 110 lines of code, the file digest.cpp computes all presented hashes:

Algorithms

Descriptions of the algorithms can be found on Wikipedia (CRC32, MD5, SHA1, SHA256, Keccak, SHA3 and HMAC and). My CRC implementation is based on the Slicing-by-8 technique which I describe more in detail in my blog, too. I recently added the faster Slicing-by-16 algorithm there but not in this library to keep the code size low. The rest is pretty much a straightforward implementation of the standard algorithms.

Several years ago I wrote an online hash calculator in PHP using the built-in hashing functions. It might be useful if you want to compare your output against an "independent" implementation.

And please don't send me comments on the design of that website - I deliberately chose awkward colors ;-)

Performance

The great OpenSSL team is working hard to provide the best hashing implementations. Not only in terms of reliability but also in terms of throughput. They often convert the core routines to assembler code. So it's no surprise that their library outperformes mine.

However, when compared to the Linux coreutils, my C++ code gives about the same performance numbers. My main system runs on a Core i7 2600K CPU (3.4 GHz). In my tests I compared these libraries (64 bit binaries):
Library Version / Settings
mine GCC 4.7 (g++ -O3 -march=native)
OpenSSL 1.0.1e-fips
CoreUtils 8.4
Yes, I'm aware that newer versions are available - especially CoreUtils - but I used the default versions of my CentOS 6.5 distribution.

My test file is enwik9, a snapshot of the first 1 billion bytes (1,000,000,000 bytes) of the English wikipedia from March 3, 2006.
A compressed copy can be downloaded from my server, too: GZip (296 MByte) or XZ (220 MByte).

Before running the tests I ensured that the file is completely loaded into the cache.
Algorithm Library command line user time throughput comparison
CRC32 my code time ./digest enwik9 --crc 0.45 seconds 2,222 MByte/sec
MD5 OpenSSL time openssl md5 enwik9 1.47 seconds 680 MByte/sec fastest
my code time ./digest enwik9 --md5 1.65 seconds 606 MByte/sec 12% slower
CoreUtils time md5sum enwik9 1.65 seconds 606 MByte/sec 12% slower
SHA1 OpenSSL time openssl sha1 enwik9 1.43 seconds 699 MByte/sec fastest
my code time ./digest enwik9 --sha1 2.58 seconds 388 MByte/sec 80% slower
CoreUtils time sha1sum enwik9 3.01 seconds 332 MByte/sec 110% slower
SHA256 OpenSSL time openssl sha256 enwik9 4.70 seconds 213 MByte/sec fastest
my code time ./digest enwik9 --sha256 5.98 seconds 167 MByte/sec 27% slower
CoreUtils time sha256sum enwik9 5.48 seconds 182 MByte/sec 17% slower
SHA3 /
Keccak (-256)
OpenSSL time openssl sha3-256 enwik9 2.78 seconds 360 MByte/sec fastest
my code time ./digest enwik9 --sha3 4.17 seconds 240 MByte/sec 50% slower
Note: when performing the test on my Raspberry Pi (ARM architecture CPU) OpenSSL runs at pretty much the same speed as mine for MD5 but is 22% faster for SHA1 and 22% faster for SHA256, too. Due to limited resources, I computed only the hashes of the first 100 MByte (enwik8 instead enwik9): you can download this test file from my server compressed as GZip (35 MByte) or XZ (27 MByte).

Portability

I successfully compiled and ran the source code on all my systems / environments:
Windows 8 CentOS 6 Debian Wheezy / ARM Debian Wheezy / PowerPC
Visual Studio 2010 yes - - -
GCC 4.4 - yes - -
GCC 4.7 - yes - yes
GCC 4.8 - yes yes -
GCC 4.9 - yes yes -
CLang / LLVM 3.0 - yes - -
CLang / LLVM 3.4 - yes yes -
In short: any modern C++ compiler (I have access to) happily compiles my code.

My Windows and CentOS installations are 64 bit systems. Debian Wheezy / ARM refers to my Raspberry Pi (32 bit system).
The Debian/PowerPC installation is just a QEMU virtual machine. I explicitly mention it because it is a big endian architecture - in contrast to Intel's little endian architecture. Please read my blog posting on setting up such a virtual machine, too.

The endianness is detected at compile time. On Linux systems, the library automatically includes endian.h which defines the preprocessor symbol __BYTE_ORDER.
On Windows systems (or when that symbol isn't defined), I assume a little endian machine.
On MacOS the endian.h header is found in the machine sub-directory. So please change in each cpp file the include to #include <machine/endian.h>.

Excerpts from sha256.cpp: Swapping bytes is a common operation of all hashing algorithms. Therefore the library uses faster compiler-specific intrinsics and provides a portable fallback code path:

Code Size

Here are some statistics about the lines-of-code (counted with CLOC):
code comments blank total
hash.h 11 lines 11 lines 6 lines 28 lines
crc32.h and crc32.cpp 370 lines 66 lines 45 lines 481 lines
md5.h and md5.cpp 308 lines 82 lines 85 lines 475 lines
sha1.h and sha1.cpp 242 lines 84 lines 67 lines 393 lines
sha256.h and sha256.cpp 309 lines 93 lines 76 lines 478 lines
keccak.h and keccak.cpp 221 lines 74 lines 62 lines 357 lines
sha3.h and sha3.cpp 221 lines 74 lines 62 lines 357 lines
The CRC32 implementation is actually pretty simple but crc32.cpp contains a huge lookup table.

Keccak

Keccak is the designated SHA3 hashing algorithm. It's website contains lots of freely available code, mostly in C/C++. However, it severely lacks proper code documentation and trying to understand the way their code is structured was unexpectedly hard for me.

My implementation computes only the most common variants (all with 1600 bits of internal state): Keccak224, Keccak256, Keccak384 and Keccak512 - just use the proper enum in the constructor.

keccak.h and keccak.cpp can be found in the Git repository (scroll up) and in the downloadable zipped file. Or just read the code right here (click show):

SHA3

In April 2014 the SHA3 proposal was slightly changed by the FIPS 202 draft (see here).
All they did is adding two bytes to the message (zero and one). The code change is minimal (just one line) but as a result, Keccak hashes differ from their SHA3 counterparts.
Most SHA3 implementations on the internet are in fact Keccak implementations which is extremely confusing.

Here is a "real" SHA3 implemetation (at least one that computes the same result vectors as the FIPS 202 draft):

Keccak and SHA3 Live Test

If you just want to compute a few simple Keccak and SHA3 hashes, you can do it right now:  


HMAC (keyed-hash message authentication code)

HMAC (see Wikipedia) was implemented as a simple template. Its header file explains its usage:
homepage