Binary Search Revisited

posted by Stephan Brumme

Same same but different ?!

Some time ago I had to write code for a binary search of sorted data. Obviously there are libraries which can do the heavy lifting for me, such as C++'s STL.

When I looked up the source code of STL's binary_search I was surprised to find that it slightly differs from what I expected: it always performs log2(N) less-than comparisons and only after that checks for equality exactly once:
hide binary_search / pseudo code (STL) while at least one element { find element in the middle if it's smaller than x strip all elements before and including middle element else strip all elements after middle element } return true if remaining element matches x, else false
That means if it finds a match (something that equals what we are searching for) it still keeps going until subdivision finishes - and forfeits the chance for an early exit.
The STL guys usually prefer fast algorithms - why did they do it ?

This clearly calls for some serious googling ... and the vast majority of web sites, including Wikipedia, combine the less-than comparison with an check for equality and indeed allow for an early exit if they find a match:
hide binary_search / pseudo code (most common algorithm) while at least one element { find element in the middle if it matches x, return true <== NEW if it's smaller than x strip all elements before and including middle element else strip all elements after middle element } return false <== DIFFERENT
In Big-O notation: if we have a container with N elements, the following number of comparisons are expected on average:
check if less than check if equal
STL ≈ log2(N) 1
"rest of world" ≈ log2(N) - 1 ≈ log2(N) - 1
Is STL right ? Does saving all these checks for equality at the cost of one more less-than give a real-world performance boost ?
There is only one way to find the best binary_search: WE NEED BENCHMARKS !

(read more ...)

Reverse MD5 Hash

posted by Stephan Brumme

Storing Passwords

Storing passwords as plain text in your database is the very, very worst idea you can ever come up with. However, almost all big companies are stupid enough to fall for it anyway as we have seen in recent hacks.

A small step towards a safer way of storing passwords is to use MD5 hashes:
  1. user enters password
  2. compute MD5(password) and store these 128 bits
  3. when user logs in again, compute MD5(password) and compare it to the stored MD5
The idea is simple, it can be done in a few lines and it's fast. And it's not safe, too.

Live Demo

Lookups are performed while you type.

Enter a MD5 (hexadecimal):

Resolved phrase:

And in case you don't have a MD5 at your fingertips ...
Phrase:

MD5 hash:


(read more ...)

Windows 7 Screenshots

posted by Stephan Brumme

Screenshots and Aero

I ran Windows XP for many, many years and skipped Vista. After switching to Windows 7, I had problems taking screenshots because the background shines through the semi-transparent window borders of Windows 7's Aero user interface.

Below is my OpenCL tool Clein in front of my blog. Pressing Alt-Print gives undesired results - while the screenshot on the right side is how it should look:



Obviously, the right screenshot looks much better ... it has a proper shadowing and the rounded corners look great, too. My only complaint is the missing focus: the Device tab header has no dashed border and the close button (the X in the upper right corner) is blue instead of red.

Shotty

Shotty is the great freeware tool used to get the improved screenshots. It comes with a basic image editor and lots of online features which I don't need at all. Quite important to me is its astonishing PNG compressor - it's often superior to PNGOUT.

A portable version of Shotty is available, too. Give it a try.

Hide Strings In Your Executable

posted by Stephan Brumme

Plain Text

Every constant string in C/C++ programs is easily recoverable in your executables. Microsoft's Process Explorer lists all strings, e.g. Hello World !:



One way to prevent non-programmers from locating your strings is to encrypt them. The most simple method is based on XOR. It's cracked within seconds by serious hackers but can keep away everyone without a programming background. And after all that's 99% of all computer users out there.

XORing your data

The basic idea is as follows: a XOR b XOR b = a, i.e. applying XOR twice give the original value. The variables a and b can be of an arbitrary non-floating-point type. If you want to encrypt strings then char and wchar_t are obvious choices.

Instead of using a constant b, it's a little bit better to have a pass-phrase (another string).
Then you can have a loop and select password[i % passwordLength]. For better performance, passwordLength should be a power of two because then the compiler can replace to slow modulo operation by a much faster AND.

A few lines of C++ is all you need:
hide std::string decode(const std::string& input) { // choose a power of two => then compiler can replace "modulo x" by much faster "and (x-1)" const size_t passwordLength = 16; // at least as long as passwordLength, can be longer, too ... static const char password[passwordLength] = "invalid pointer"; // out = in XOR NOT(password) std::string result = input; for (size_t i = 1; i < input.length(); i++) result[i] ^= ~password[i % passwordLength]; return result; } // "Hello World !" printf(decode("\xde\xf4\xe5\xf2\xfc\xb6\xcc\xb0\xfd\xfc\xf2\xb1\xaa").c_str());

(read more ...)

Generate QR Codes With PHP and Google Charts API

posted by Stephan Brumme

QR Codes

Almost 20 years ago the Japanese company Denso Wave (owned by Toyota) invented QR codes. These 2D images can encode pretty much any Unicode string and are best known for encoding URLs. Apps for all kind of mobile phones are available, often for free.

Please keep in mind: it's a patented technology but standardized as ISO 18004. Denso Wave owns all rights (incl. a registered trademark on "QR code") but they claim on their website to not exercise their rights.

If you don't care about the technical stuff and want to go straight to my final PHP code, then click here.

Google Charts API

Google Charts is a free online service that can generate a huge variety of charts. Think of it as Excel without the spreadsheets ;-)

With Google Charts it's extremely simple to create a QR code. As the bare minimum, you need to provide only two pieces of information: the URL and the image size.

hide Generate QR Code $width = $height = 100; $url = urlencode("http://create.stephan-brumme.com"); echo "<img src=\"http://chart.googleapis.com/chart?chs={$width}x{$height}&cht=qr&chl=$url\" />";
And the following image will be displayed:

(read more ...)

feed.rss and sitemap.xml

posted by Stephan Brumme

Discovery

In November 2010 I saw space shuttle Discovery in Cape Canaveral. A completely different kind of discovery is widespread among internet sites, especially blogs:
  1. RSS feeds for news readers
  2. Sitemaps for search engines
Both are available now for create.stephan-brumme.com.

News feed - feed.rss

My RSS 2.0 feed is generated on-the-fly by a simple PHP script.

(read more ...)

My Experiences with AdSense

posted by Stephan Brumme

Overview

In March 2008 I started including AdSense on my homepage. Since then, I counted about 600000 visitors and about 50% of them (to be exactly: 285236 as of September 28, 2011) were exposed to AdSense advertisements.

It took a little bit more than 3 years until I received my first check/cheque from Google (70 Euros, ca. 100 US-$).
In the meantime, my web hoster charged me about 400 Euros (ca. 500$).

Clicks

AdSense pays per click and not per view. My click-through rate is only 0.19% (551 clicks) and I got on average 0.14 Euros per click.
My best month so far was December 2008 (4.51 Euros), my worst August 2010 (0.36 Euros). There is no significant increase nor decline.
I am using only two banner sizes: 468x60 and 728x90. The smaller banners actually have a better click-through rate.

In 2010, when I had the most visitors, the click-through rate was the lowest.
My overall impression is that the click-through rate is inverse proportional to the number of visitors.
(read more ...)

Emulating MemCache in Pure PHP

posted by Stephan Brumme

Soccer World Cup 2010

You might wonder: what has soccer to do with MemCache ?
Well, in 2006 I started a just-for-fun website (in German) where my colleagues could bet on the games of the Soccer FIFA World Cup (2006 & 2010) and UEFA Cup (2008). The first price wasn't money but eternal fame - and I won twice ;-)
Disclaimer: I didn't cheat at all, just plain luck (and a very defensive betting strategy) !

The back-end consists of PHP5 and SQLite. Everything is created on the fly which requires quite a lot of SQL queries (about 300 queries for World Cup 2010). I added more and more features for each soccer tournament, mostly focussed on data mining. As a result, the 2008 version became noticely slow over time and needed about 500ms to generate the front page (2006: 200ms).

I expected the 2010 version to need more than 1 second for the front page. However, after adding a MemCache-like wrapper around the SQL-queries, I was able to generate the front page in less than 30 milliseconds.
(read more ...)

Simple Ajax

posted by Stephan Brumme

Interactive Web Sites

Ajax is the technology upon which pretty much every modern interactive website is build (except for these weird Flash sites). Professional web sites use large frameworks like jQuery that wrap the details of Ajax in a nice programming interface. That's perfectly fine but total overkill for a small blog like this.

Live Demo

This DNS-to-IP resolver returns the IPv4 address of any domain. It may take up to 3 seconds to get a result.

Enter a domain name: => IP:

Have a look at the sorting algorithm live demo which is based on Ajax, too.

(read more ...)

Sparklines

posted by Stephan Brumme

Why sparklines ?

When I started my realtime logfile analysis website status.stephan-brumme.com, I was looking for a way to display activities, such as traffic or hits, in compact yet intuitive way.
A primary concern was that standard diagrams just take too much space. I gradually reduced their size until they had about the same height as the surrounding text:

214x83 pixels: 214x83 pixels 107x42 pixels: 107x42 pixels 54x21 pixels: 54x21 pixels 27x11 pixels: 27x11 pixels

It's impossible to have any readable legend in the smallest diagram - and I don't really need it. All I want to know is: is a lot going on ? or is my server pretty much dead ? The small diagram fits these needs.

I wasn't the first to come up with the idea of diagrams scaled down to the size of text. Edward Tufte published some of the most impressive books about information visualization and coined the term sparklines.

There is a huge variaties of sparklines, not only the classic line chart. I wanted something more computer-like and ended up with this drawing style: random sparkline.

Live Demo

Create some of sparklines on-the-fly: => live demo sparkline

(read more ...)

browse through all postings:
newer stuff:

2018/Nov/14   smalLZ4 - optimal LZ4 compression
2018/Oct/15   flexiGIF - lossless GIF/LZW optimization
2016/Aug/04   xxHash - a hash algorithm as fast as memcpy
2015/Aug/11   Tiny Regular Expression Matcher
2015/Feb/04   Fast CRC32
2014/Aug/25   Practical String Searching
2014/Aug/19   The Mersenne Twister Pseudo Random Number Generator
2014/Jul/11   Parallel Sieve of Eratosthenes
2014/Jun/23   Windows and Linux Code Timing
2014/Jun/14   Portable C++ Hashing Library
2014/Feb/27   Sorting STL Containers
2013/Sep/18   Portable Memory Mapping C++ Class
2013/May/03   Testing Endian Aware Code
2013/Apr/24   Software License
2013/Feb/15   Read-Only Git Access Available
2011/Dec/05   Fowler-Noll-Vo Hash (FNV1a)
2011/Dec/01   Drawing Antialiased Circles and Ellipses
2011/Nov/27   Uncompressing DEFLATE Data
2011/Nov/23   Streaming MPEG-4 Movies From Your Own Server

older stuff:

2011/Sep/12   smallpt - Global Illumination in Javascript
2011/Sep/11   Multi-Core Sorting
2011/Sep/06   CLein - The tiny, portable OpenCL info tool
homepage