Uncompressing DEFLATE Data

posted November 27, 2011 by Stephan Brumme

The Most Common Compression Algorithm

Everyone knows the widespread ZIP file format. Some Windows users may have never heard of GZIP, but all Unix users did. 99% percent of all ZIPs are encoded with the so-called DEFLATE algorithm, specified as RFC 1951. The same holds true for PNG images, some PDFs, compressed HTTP data transfer and many more.

I wanted to have a C++ implementation without strange dependencies and weird pointer arithmentic. I wanted something like this:

    
     #include "Inflate.h"
     std::vector<unsigned char> uncompressed = Inflate("filename.gz");

So I wrote my own DEFLATE decoder in pure C++/STL. It's reasonably fast - somewhere between 50 and 100 MByte/s (uncompressed data) - and compiles easily with Visual C++ and GCC (older version should be fine, too).

Download

Download deflate-decoder.zip

Latest release: November 27, 2011, size: 25.2 kBytes

CRC32: 7caba0b2
MD5: 079bbc381743a0f64e29908548bfeed5
SHA1: 71a7be05c51bda6026184980580a775a9d5e8ef6
SHA256:8ffd911e74a4d378711816fb473dff8ed930a871d20275cfb11247eaf3384eba

If you encounter any bugs/problems or have ideas for improving future versions, please write me an email: create@stephan-brumme.com
(read more ...)

Streaming MPEG-4 Movies From Your Own Server

posted November 23, 2011 by Stephan Brumme

No Need For YouTube

YouTube is great. Really ! You can find everything you want to watch - and probably everything you don't want to.

On my website photos.stephan-brumme.com are the best photos and videos I took while travelling all over world. Everybody from all over the world can watch them - that's intended and obviously the way the internet should work. Nevertheless, these are MY photos and MY videos and I want to be able to remove them if necessary. That's why I don't upload them to www.flickr.com or www.youtube.com.

There is plenty of free space on my (virtual) web server and my contract includes unlimited traffic. Even more important, my server has a better internet connection than I have at home: my 25 MBit/s VDSL connection is 100% saturated when downloading from my server. I'm not sure but I assume my server can easily achieve 100 MBit/s (it's Germany's biggest hoster, by the way). YouTube is much slower and sometimes doesn't even exceed 3-4 MBit/s, especially after 6pm.

HTML5's Video Capabilities

The upcoming HTML5 standard introduced a new <video> tag. All modern browsers (Chrome, Opera, Firefox, Internet Explorer 9+ and even Android/iPhones) can play videos embedded in a website natively without resorting to Flash. Well, almost ...

hide HMTL's 5 video tag


     <video width="400" height="300" controls preload="metadata"
            poster="https://photos.stephan-brumme.com/2009/cambodia/MOV01263.jpg">
      <source src="https://photos.stephan-brumme.com/2009/cambodia/MOV01263.mp4" type="video/mp4" />
     </video>

One of the most efficient video codecs is MPEG-4, also known as AVC or H.264. Most high-definition (HD) stuff is based on MPEG-4, e.g. Blu-Rays. YouTube's HD videos are encoded in the MPEG-4 file format, too.
(read more ...)

Binary Search Revisited

posted November 8, 2011 by Stephan Brumme

Same same but different ?!

Some time ago I had to write code for a binary search of sorted data. Obviously there are libraries which can do the heavy lifting for me, such as C++'s STL.

When I looked up the source code of STL's binary_search I was surprised to find that it slightly differs from what I expected: it always performs log2(N) less-than comparisons and only after that checks for equality exactly once:

hide binary_search / pseudo code (STL)


     while at least one element
     {
       find element in the middle
       if it's smaller than x
         strip all elements before and including middle element
       else
         strip all elements after middle element
     }
     return true if remaining element matches x, else false

That means if it finds a match (something that equals what we are searching for) it still keeps going until subdivision finishes - and forfeits the chance for an early exit.
The STL guys usually prefer fast algorithms - why did they do it ?

This clearly calls for some serious googling ... and the vast majority of web sites, including Wikipedia, combine the less-than comparison with an check for equality and indeed allow for an early exit if they find a match:

hide binary_search / pseudo code (most common algorithm)


     while at least one element
     {
       find element in the middle
       if it matches x, return true <== NEW
       if it's smaller than x
         strip all elements before and including middle element
       else
         strip all elements after middle element
     }
     return false <== DIFFERENT

In Big-O notation: if we have a container with N elements, the following number of comparisons are expected on average:

	check if less than	check if equal
STL	≈ log₂(N)	1
"rest of world"	≈ log₂(N) - 1	≈ log₂(N) - 1

Is STL right ? Does saving all these checks for equality at the cost of one more less-than give a real-world performance boost ?
There is only one way to find the best binary_search: WE NEED BENCHMARKS !

(read more ...)

Reverse MD5 Hash

posted October 20, 2011 by Stephan Brumme

Storing Passwords

Storing passwords as plain text in your database is the very, very worst idea you can ever come up with. However, almost all big companies are stupid enough to fall for it anyway as we have seen in recent hacks.

A small step towards a safer way of storing passwords is to use MD5 hashes:

user enters password
compute MD5(password) and store these 128 bits
when user logs in again, compute MD5(password) and compare it to the stored MD5

The idea is simple, it can be done in a few lines and it's fast. And it's not safe, too.

Live Demo

Lookups are performed while you type.

And in case you don't have a MD5 at your fingertips ...

(read more ...)

Windows 7 Screenshots

posted October 17, 2011 by Stephan Brumme

Screenshots and Aero

I ran Windows XP for many, many years and skipped Vista. After switching to Windows 7, I had problems taking screenshots because the background shines through the semi-transparent window borders of Windows 7's Aero user interface.

Below is my OpenCL tool Clein in front of my blog. Pressing Alt-Print gives undesired results - while the screenshot on the right side is how it should look:

Screenshot Alt-Print

Obviously, the right screenshot looks much better ... it has a proper shadowing and the rounded corners look great, too. My only complaint is the missing focus: the Device tab header has no dashed border and the close button (the X in the upper right corner) is blue instead of red.

Shotty

Shotty is the great freeware tool used to get the improved screenshots. It comes with a basic image editor and lots of online features which I don't need at all. Quite important to me is its astonishing PNG compressor - it's often superior to PNGOUT.

A portable version of Shotty is available, too. Give it a try.

Hide Strings In Your Executable

posted October 16, 2011 by Stephan Brumme

Plain Text

Every constant string in C/C++ programs is easily recoverable in your executables. Microsoft's Process Explorer lists all strings, e.g. Hello World !:

One way to prevent non-programmers from locating your strings is to encrypt them. The most simple method is based on XOR. It's cracked within seconds by serious hackers but can keep away everyone without a programming background. And after all that's 99% of all computer users out there.

`XOR`ing your data

The basic idea is as follows: a XOR b XOR b = a, i.e. applying XOR twice give the original value. The variables a and b can be of an arbitrary non-floating-point type. If you want to encrypt strings then char and wchar_t are obvious choices.

Instead of using a constant b, it's a little bit better to have a pass-phrase (another string).
Then you can have a loop and select password[i % passwordLength]. For better performance, passwordLength should be a power of two because then the compiler can replace to slow modulo operation by a much faster AND.

A few lines of C++ is all you need:

hide


     std::string decode(const std::string& input)
     {
       // choose a power of two => then compiler can replace "modulo x" by much faster "and (x-1)"
       const size_t passwordLength = 16;
       // at least as long as passwordLength, can be longer, too ...
       static const char password[passwordLength] = "invalid pointer";
     
       // out = in XOR NOT(password)
       std::string result = input;
       for (size_t i = 1; i < input.length(); i++)
         result[i] ^= ~password[i % passwordLength];
       return result;
     }
     
     // "Hello World !"
     printf(decode("\xde\xf4\xe5\xf2\xfc\xb6\xcc\xb0\xfd\xfc\xf2\xb1\xaa").c_str());

(read more ...)

Generate QR Codes With PHP and Google Charts API

posted October 11, 2011 by Stephan Brumme

QR Codes

Almost 20 years ago the Japanese company Denso Wave (owned by Toyota) invented QR codes. These 2D images can encode pretty much any Unicode string and are best known for encoding URLs. Apps for all kind of mobile phones are available, often for free.

Please keep in mind: it's a patented technology but standardized as ISO 18004. Denso Wave owns all rights (incl. a registered trademark on "QR code") but they claim on their website to not exercise their rights.

If you don't care about the technical stuff and want to go straight to my final PHP code, then click here.

Google Charts API

Google Charts is a free online service that can generate a huge variety of charts. Think of it as Excel without the spreadsheets ;-)

With Google Charts it's extremely simple to create a QR code. As the bare minimum, you need to provide only two pieces of information: the URL and the image size.

hide Generate QR Code


     $width = $height = 100;
     $url   = urlencode("http://create.stephan-brumme.com");
     echo "<img src=\"http://chart.googleapis.com/chart?chs={$width}x{$height}&cht=qr&chl=$url\" />";

And the following image will be displayed:

(read more ...)

`feed.rss` and `sitemap.xml`

posted October 2, 2011 by Stephan Brumme

Discovery

In November 2010 I saw space shuttle Discovery in Cape Canaveral. A completely different kind of discovery is widespread among internet sites, especially blogs:

RSS feeds for news readers
Sitemaps for search engines

Both are available now for create.stephan-brumme.com.

News feed - `feed.rss`

My RSS 2.0 feed is generated on-the-fly by a simple PHP script.

(read more ...)

My Experiences with AdSense

posted September 28, 2011 by Stephan Brumme

Overview

In March 2008 I started including AdSense on my homepage. Since then, I counted about 600000 visitors and about 50% of them (to be exactly: 285236 as of September 28, 2011) were exposed to AdSense advertisements.

It took a little bit more than 3 years until I received my first check/cheque from Google (70 Euros, ca. 100 US-$).
In the meantime, my web hoster charged me about 400 Euros (ca. 500$).

Clicks

Clicks

AdSense pays per click and not per view. My click-through rate is only 0.19% (551 clicks) and I got on average 0.14 Euros per click.
My best month so far was December 2008 (4.51 Euros), my worst August 2010 (0.36 Euros). There is no significant increase nor decline.
I am using only two banner sizes: 468x60 and 728x90. The smaller banners actually have a better click-through rate.

In 2010, when I had the most visitors, the click-through rate was the lowest.
My overall impression is that the click-through rate is inverse proportional to the number of visitors.
(read more ...)

Emulating MemCache in Pure PHP

posted September 27, 2011 by Stephan Brumme

Soccer World Cup 2010

You might wonder: what has soccer to do with MemCache ?
Well, in 2006 I started a just-for-fun website (in German) where my colleagues could bet on the games of the Soccer FIFA World Cup (2006 & 2010) and UEFA Cup (2008). The first price wasn't money but eternal fame - and I won twice ;-)
Disclaimer: I didn't cheat at all, just plain luck (and a very defensive betting strategy) !

The back-end consists of PHP5 and SQLite. Everything is created on the fly which requires quite a lot of SQL queries (about 300 queries for World Cup 2010). I added more and more features for each soccer tournament, mostly focussed on data mining. As a result, the 2008 version became noticely slow over time and needed about 500ms to generate the front page (2006: 200ms).

I expected the 2010 version to need more than 1 second for the front page. However, after adding a MemCache-like wrapper around the SQL-queries, I was able to generate the front page in less than 30 milliseconds.
(read more ...)

browse through all postings:

newer stuff:

2021/Jun/30 Length-Limited Prefix Codes
2020/Apr/13 smalLZ4 - optimal LZ4 compression
2019/Jun/17 toojpeg - a JPEG encoder in a single C++ file
2018/Oct/15 flexiGIF - lossless GIF/LZW optimization
2016/Aug/04 xxHash - a hash algorithm as fast as memcpy
2015/Aug/11 Tiny Regular Expression Matcher
2015/Feb/04 Fast CRC32
2014/Aug/25 Practical String Searching
2014/Aug/19 The Mersenne Twister Pseudo Random Number Generator
2014/Jul/11 Parallel Sieve of Eratosthenes
2014/Jun/23 Windows and Linux Code Timing
2014/Jun/14 Portable C++ Hashing Library
2014/Feb/27 Sorting STL Containers
2013/Sep/18 Portable Memory Mapping C++ Class
2013/May/03 Testing Endian Aware Code
2013/Apr/24 Software License
2013/Feb/15 Read-Only Git Access Available
2011/Dec/05 Fowler-Noll-Vo Hash (FNV1a)
2011/Dec/01 Drawing Antialiased Circles and Ellipses

older stuff:

2011/Sep/14 Simple Ajax
2011/Sep/13 Sparklines
2011/Sep/12 smallpt - Global Illumination in Javascript
2011/Sep/11 Multi-Core Sorting
2011/Sep/06 CLein - The tiny, portable OpenCL info tool


June 30, 2021: Length-Limited Prefix Codes April 13, 2020: smalLZ4 - optimal LZ4 compression June 17, 2019: toojpeg - a JPEG encoder in a single C++ file ... and 30 more !	RSS	www.stephan-brumme.com bits.stephan-brumme.com photos.stephan-brumme.com minime.stephan-brumme.com
© 2011-2025 Stephan Brumme products and company names mentioned may be trademarks of their respective owners. see my disclaimer. dude, this is the end ... of this page.