Windows and Linux Code Timing

posted by Stephan Brumme

Whenever I have to do some performance testing, I try the most simple way: write a short program and run it within my Linux bash shell which comes with a built-in time command:
$ time ./testcode real 0m0.788s user 0m0.696s sys 0m0.080s
Obviously there can (and will) be some overhead by the operating system. So numbers are typically a bit off.
The timing resolution isn't that great, either, and often lies above 10ms.

That's why I wrote a simple C routine to measure performance inside the code. My goal was to have numbers that are accurate down to about 1 microsecond (10-6).
I didn't opt for extremely high timing precision (count CPU ticks) because I rarely need that degree of precision and really want to avoid the tricky details of CPU frequency scaling, cache stalls etc.

Another feature is portability; simply recompile the code and run it on Windows and Linux. The code works fine with GCC as well as Visual C++.

All my timing code contains is a single function seconds() returning a double. You can invoke it in your programs like this:
double start = seconds(); // ... your code that should be measured ... double duration = seconds() - start;
Note the code comments: especially for Windows the zero-point isn't specified and there is a slight chance that the returned number can wrap around.
hide // include OS specific timing library #ifdef _MSC_VER // Windows #include <Windows.h> #else // Linux #include <time.h> #endif /// return a timestamp with sub-second precision /** QueryPerformanceCounter and clock_gettime have an undefined starting point (null/zero) and can wrap around, i.e. be nulled again. **/ double seconds() { #ifdef _MSC_VER static LARGE_INTEGER frequency; if (frequency.QuadPart == 0) ::QueryPerformanceFrequency(&frequency); LARGE_INTEGER now; ::QueryPerformanceCounter(&now); return now.QuadPart / double(frequency.QuadPart); #else struct timespec now; clock_gettime(CLOCK_MONOTONIC, &now); return now.tv_sec + now.tv_nsec / 1000000000.0; #endif }
A short test program attempts to determine the timer resolution:
show timing.c // ////////////////////////////////////////////////////////// // timing.c // Copyright (c) 2014 Stephan Brumme. All rights reserved. // see http://create.stephan-brumme.com/disclaimer.html // // only for printf in the test code in main() #include <stdio.h> // include OS specific timing library #ifdef _MSC_VER // Windows #include <Windows.h> #else // Linux #include <time.h> #endif /// return a timestamp with sub-second precision /** QueryPerformanceCounter and clock_gettime have an undefined starting point (null/zero) and can wrap around, i.e. be nulled again. **/ double seconds() { #ifdef _MSC_VER static LARGE_INTEGER frequency; if (frequency.QuadPart == 0) ::QueryPerformanceFrequency(&frequency); LARGE_INTEGER now; ::QueryPerformanceCounter(&now); return now.QuadPart / double(frequency.QuadPart); #else struct timespec now; clock_gettime(CLOCK_MONOTONIC, &now); return now.tv_sec + now.tv_nsec / 1000000000.0; #endif } // test code int main(int argc, char** argv) { // find minimum gap between two consecutive timestamps => timing resolution const int NumTests = 10; int i; for (i = 0; i < NumTests; i++) { double time1 = seconds(); double time2 = time1; // wait until timestamp changes while (time2 == time1) time2 = seconds(); // compute difference ... double microseconds = (time2 - time1) * 1000*1000; double megahertz = 1 / microseconds; // ... and display it printf("%f microseconds => %f megahertz\n", microseconds, megahertz); } return 0; }
Git users: scroll down to the repository link
Download  timing.c
Latest release: June 23, 2014, size: 1555 bytes, 65 lines

CRC32: 8cdf40fa
MD5: 2d4012f200d62a14aa3e31b374509c10
SHA1: 80ed042505b97eb6969f8b28971ae07239fa3732
SHA256:c1ab05c2ec925fb067ff90377d4a594af23f7195d22b128c3140fae4409233b5

Stay up-to-date:git clone http://create.stephan-brumme.com/windows-and-linux-code-timing/.git

If you encounter any bugs/problems or have ideas for improving future versions, please write me an email: create@stephan-brumme.com

Changelog

Linux

My Linux web server gives me these results:
hide $ uname -a Linux icpu3665 3.2.58-infong-14157 #1 SMP Fri Jun 6 14:05:34 CEST 2014 x86_64 GNU/Linux $ gcc -O3 timing.c -o timing -lrt && ./timing 1.223998 microseconds => 0.816995 megahertz 0.917999 microseconds => 1.089326 megahertz 0.900003 microseconds => 1.111107 megahertz 0.453001 microseconds => 2.207502 megahertz 0.462003 microseconds => 2.164488 megahertz 0.450003 microseconds => 2.222207 megahertz 0.453002 microseconds => 2.207497 megahertz 0.444001 microseconds => 2.252249 megahertz 0.450001 microseconds => 2.222219 megahertz 0.444003 microseconds => 2.252238 megahertz
You can improve precision by replacing CLOCK_MONOTONIC by its siblings CLOCK_MONOTONIC_RAW and CLOCK_MONOTONIC_COARSE.

If you really need to avoid the occasional zero-wraparound, then the Linux call gettimeofday is worth looking into.
As a nice side effect seconds() will return the number of seconds since January 1, 1970.
That means, then you not only can compute time differentials but also use seconds for log outputs:
hide // changed include: #include <sys/time.h> // change code in seconds(): // lower resolution but never wraps around: struct timeval now; gettimeofday(&now, NULL); return now.tv_sec + now.tv_usec / 1000000.0;
Still pretty good timing resolution:
hide 0.953674 microseconds => 1.048576 megahertz 0.953674 microseconds => 1.048576 megahertz 0.953674 microseconds => 1.048576 megahertz 1.192093 microseconds => 0.838861 megahertz 0.953674 microseconds => 1.048576 megahertz 1.192093 microseconds => 0.838861 megahertz 0.953674 microseconds => 1.048576 megahertz 0.953674 microseconds => 1.048576 megahertz 0.953674 microseconds => 1.048576 megahertz 0.953674 microseconds => 1.048576 megahertz
Fully working code is available for download:
show wallclock.c // ////////////////////////////////////////////////////////// // wallclock.c // Copyright (c) 2014 Stephan Brumme. All rights reserved. // see http://create.stephan-brumme.com/disclaimer.html // // only for printf in the test code in main() #include <stdio.h> // include OS specific timing library #ifdef _MSC_VER // Windows #include <Windows.h> #else // Linux #include <sys/time.h> #endif /// return a timestamp with sub-second precision /** QueryPerformanceCounter has an undefined starting point (null/zero) and can wrap around, i.e. be nulled again. gettimeofday (Linux only) returns the number of seconds since January 1, 1970 **/ double seconds() { #ifdef _MSC_VER static LARGE_INTEGER frequency; if (frequency.QuadPart == 0) ::QueryPerformanceFrequency(&frequency); LARGE_INTEGER now; ::QueryPerformanceCounter(&now); return now.QuadPart / double(frequency.QuadPart); #else // lower resolution but never wraps around: struct timeval now; gettimeofday(&now, NULL); return now.tv_sec + now.tv_usec / 1000000.0; #endif } // test code int main(int argc, char** argv) { // find minimum gap between two consecutive timestamps => timing resolution const int NumTests = 10; int i; for (i = 0; i < NumTests; i++) { double time1 = seconds(); double time2 = time1; // wait until timestamp changes while (time2 == time1) time2 = seconds(); // compute difference ... double microseconds = (time2 - time1) * 1000*1000; double megahertz = 1 / microseconds; // ... and display it printf("%f microseconds => %f megahertz\n", microseconds, megahertz); } return 0; }
Git users: scroll down to the repository link
Download  wallclock.c
Latest release: June 23, 2014, size: 1655 bytes, 67 lines

CRC32: fced30b9
MD5: 4c85bae1c200ee26f4b2a9765cdfd18e
SHA1: 2f7f5fb74bde9b33049277e2c9d4c7f6ffa9c555
SHA256:78109424ee91949ba3b506fd4cf9ed7a9b1f4d9c04a346f0b7c05bbd543fc25a

Stay up-to-date:git clone http://create.stephan-brumme.com/windows-and-linux-code-timing/.git

If you encounter any bugs/problems or have ideas for improving future versions, please write me an email: create@stephan-brumme.com

Windows

My Windows 7 desktop computer is a bit more consistent:
hide 0.301854 microseconds => 3.312860 megahertz 0.301858 microseconds => 3.312820 megahertz 0.301858 microseconds => 3.312820 megahertz 0.301854 microseconds => 3.312860 megahertz 0.301854 microseconds => 3.312860 megahertz 0.301858 microseconds => 3.312820 megahertz 0.301858 microseconds => 3.312820 megahertz 0.301858 microseconds => 3.312820 megahertz 0.301854 microseconds => 3.312860 megahertz 0.301858 microseconds => 3.312820 megahertz

time reloaded

Most Linux systems have two time commands: One is built-in into most shells (as mentioned before) and then there is /usr/bin/time.
Its output is much more detailed:
$ /usr/bin/time ./myprogram 4.90user 10.33system 1:36.24elapsed 15%CPU (0avgtext+0avgdata 9600maxresident)k 4934176inputs+7270568outputs (2major+775minor)pagefaults 0swaps
In addition to plain running time, you get memory statistics and I/O traffic.
homepage