Reverse MD5 Hash
posted by Stephan Brumme
Storing PasswordsStoring passwords as plain text in your database is the very, very worst idea you can ever come up with. However, almost all big companies are stupid enough to fall for it anyway as we have seen in recent hacks.
A small step towards a safer way of storing passwords is to use MD5 hashes:
- user enters password
- compute MD5(password) and store these 128 bits
- when user logs in again, compute MD5(password) and compare it to the stored MD5
Live DemoLookups are performed while you type.
And in case you don't have a MD5 at your fingertips ...
Rainbow TablesToday's cheap storage allows for storing huge number of potential phrases and their MD5 (called rainbow tables). If you encounter an MD5 hash and you don't know the original password, then you just perform a simple lookup and get the original password within milliseconds.
It's impossible to generate a complete rainbow table because there are 2^128 different MD5 hashes - that's a number bigger than the total number of atoms in the universe. Moreover, collisions can occur: that means, two different phrases may map to the same MD5 hash.
I built a database containing 17,749,291 phrases (roughly 2^24) based on real passwords and automatically generated short phrases that are potential passwords (like all 6-digit numbers, the complete English dictionary, ...).
It's more or less a giant collection of phrases that are not good passwords.
At first I tried to implement a proprietary data structure to reduce lookup times. A temporarily SQLite database was surprisingly efficient and therefore I decided to keep this simple approach. The server is able to process about 50 lookups per second and is mostly limited by disk access: two-three seeks, one for the index and one or two for the result.
After the first lookup, the index often fits into the server's cache and it gets even faster.
The database is a single 1.0 GByte file. It contains 17,749,291 phrases (about 200 MByte), about 300 MByte MD5 hashes, about 200 MByte indexes and about 300 MByte SQLite overhead. All database rows are sorted by their MD5 hash. The MD5 hashes are not stored as a hexadecimal string but split into 8x 16 bit integers. The index covers only the first 16 bit integer which is sufficient because nearby MD5 can be easily found by scanning the next rows (which have a very high probability to be stored in the same block).