Fowler-Noll-Vo Hash (FNV1a)
posted by Stephan Brumme
At The Speed Of Light
Four weeks ago I posted an article about the popular CRC32 hashing algorithm. CRC32 works best on large data blocks because short sequences might lead to an increased number of collisions.The FNV hash created by Fowler, Noll and Vo (see their website) comes with zero memory overhead (no precomputed look-up tables), is an incremental (rolling) hash and has a good avalanche behavior. It needs at about 4.5 CPU cycles per hashed byte on my computer, that's a little bit more than 700 MByte/second. GCC and Visual C++ produce identical code.
The heart of the hash function can be expressed in a single line (FNV1a):
newHash = (oneByte ^ oldHash) * Prime;
XOR
and multiplication leads to the original FNV1 hash which is considered slightly inferior:
newHash = (oneByte * Prime) ^ oldHash;
unsigned char
looks as follows in C/C++:
hide
fnv1a, one byte
And an array of bytes just calls
// default values recommended by http://isthe.com/chongo/tech/comp/fnv/
const uint32_t Prime = 0x01000193; // 16777619
const uint32_t Seed = 0x811C9DC5; // 2166136261
/// hash a single byte
inline uint32_t fnv1a(unsigned char oneByte, uint32_t hash = Seed)
{
return (oneByte ^ hash) * Prime;
}
fnv1a
for each element:
hide
fnv1a, multiple bytes
Actually, the
/// hash a block of memory
uint32_t fnv1a(const void* data, size_t numBytes, uint32_t hash = Seed)
{
assert(data);
const unsigned char* ptr = (const unsigned char*)data;
while (numBytes--)
hash = fnv1a(*ptr++, hash);
return hash;
}
while
-loop in the downloadable fnv.h
file
(scroll to the end of this page) is replaced by
hash = (*ptr++ ^ hash) * Prime;
Other Basic Datatypes
short
and int
values' hash is simply the result of applying the
code shown above to each of the 2 or 4 bytes:
hide
fnv1a, short and int
The code for
/// hash a short (two bytes)
inline uint32_t fnv1a(unsigned short twoBytes, uint32_t hash = Seed)
{
const unsigned char* ptr = (const unsigned char*) &twoBytes;
hash = fnv1a(*ptr++, hash);
return fnv1a(*ptr , hash);
}
/// hash a 32 bit integer (four bytes)
inline uint32_t fnv1a(uint32_t fourBytes, uint32_t hash = Seed)
{
const unsigned char* ptr = (const unsigned char*) &fourBytes;
hash = fnv1a(*ptr++, hash);
hash = fnv1a(*ptr++, hash);
hash = fnv1a(*ptr++, hash);
return fnv1a(*ptr , hash);
}
float
and double
re-interprets the number as a block of bytes.
These functions are identical except for the keywords float
and double
.
hide
fnv1a, float and double
/// hash a float
uint32_t fnv1a(float number, uint32_t hash = Seed)
{
return fnv1a(&number, sizeof(number), hash);
}
/// hash a double
uint32_t fnv1a(double number, uint32_t hash = Seed)
{
return fnv1a(&number, sizeof(number), hash);
}
Strings
Hashing C strings (zero-terminated) and C++ strings (std::string
) isn't a problem:
hide
fnv1a, strings
The execution times of hashing a C string vs.
/// hash a C-style string
uint32_t fnv1a(const char* text, uint32_t hash = Seed)
{
assert(text);
while (*text)
hash = fnv1a((unsigned char)*text++, hash);
return hash;
}
/// hash an std::string
uint32_t fnv1a(const std::string& text, uint32_t hash = Seed)
{
return fnv1a(text.c_str(), text.length(), hash);
}
std::string
are identical.Update December 6, 2011: To speed up Debug mode, the downloadable
fnv.h
is slightly different (fnv1a
is explicitly inlined for C-style strings).
Unrolling The Inner Loop
Often it's a good idea to (partially) unroll the most inner loop. I couldn't observe a significant speed-up when I wrote an unrolled version of the FNV1a hash. The compiler emitted unrolled binary code as intended but the throughput remained the same. As always, your mileage may vary.Here is the unrolled source code nevertheless:
hide
fnv1a_unrolled
The code is a template which takes a number as a template parameter, e.g. to unroll 8 times:
/// hash a block of memory
template <unsigned int Unroll>
uint32_t fnv1a_unrolled(const void* data, size_t numBytes, uint32_t hash = Seed)
{
assert(data);
const unsigned char* ptr = (const unsigned char*)data;
// unroll
while (numBytes >= Unroll)
{
// Unroll is a constant and smart compilers like GCC and Visual C++ unroll properly
hash = fnv1a(ptr, Unroll, hash);
ptr += Unroll;
numBytes -= Unroll;
}
// process remaining bytes
return fnv1a(ptr, numBytes, hash);
}
uint32_t newHash = fnv1a_unrolled<8>(someData, numBytes, previousHash);
for
-loop as desired.To prevent the programmer from accidently setting the template parameter to
0
or
1
, these two cases are partially specialized and redirect to the unrolled version
of fnv1a
.
hide
fnv1a_unrolled (special cases)
Note that
/// catch invalid Unroll value
template <>
inline uint32_t fnv1a_unrolled<0>(const void* data, size_t numBytes, uint32_t hash)
{
return fnv1a(data, numBytes, hash);
}
/// not unrolled at all
template <>
inline uint32_t fnv1a_unrolled<1>(const void* data, size_t numBytes, uint32_t hash)
{
return fnv1a(data, numBytes, hash);
}
hash
cannot have a default value (Seed
).
I'm not sure whether that's part of the C++ standard or just a compiler limitation.
Download
Git users: scroll down to the repository link Latest release: April 24, 2013, size: 3579 bytes, 138 linesCRC32:
3e9aa818
MD5:
4718cd739321ee47f422d837d438563b
SHA1:
db5a7745a83502fcaea56ff11c23f5138b13e196
SHA256:
f04303bd27d3df8c891828ff6b80331571e7841716c5860d70ea1bfbdee726d4
Stay up-to-date:git clone https://create.stephan-brumme.com/fnv-hash/git
If you encounter any bugs/problems or have ideas for improving future versions, please write me an email: create@stephan-brumme.com
Changelog
- License
- April 24, 2013
- zlib-style license, applies to all past and future versions
- version 2
- December 6, 2011
- faster in Debug mode
- version 1
- December 5, 2011
- initial release