Interactive demos require Javascript. Anything else works.

A JPEG encoder in a single C++ file ...

posted by Stephan Brumme, updated
Recently, one of my programs needed to write a few JPEG files. The obvious choice is to link against libjpeg (or even better: libjpeg-turbo).

These libraries are fast, loaded with a plethora of features and very well tested.
However, they are huge (static linking is pretty much a no-go) and their API is kind of too complicated if you "just want to save an image".

Building my own JPEG encoder

I found a great alternative on Jon Olick's website - his JPEG Writer. It's barely 336 lines of C++ code in a single file and part of the very popular stb_image_write library, too.

However, the large number of magic constants bothered me, so I started looking up their meaning and in the end wrote my own JPEG encoder from scratch.
I realized that a few things weren't optimal and/or missing in Jon's library:
It was a fun experience and I learned a lot about JPEGs and the JFIF file format.
Omar Shehata wrote a great article Unraveling the JPEG which encouraged me to rewrite major parts of toojpeg for a massive speed-up (version 1.2 and later).

Using my toojpeg library

It boils down to 3 simple steps:
  1. include the toojpeg.h header file
  2. define a callback that accept a single compressed byte
    • will be called for every byte
    • basically similar to fputc
    • ... but can be anything you like, too, e.g. send that byte via network, add it to an std::vector, ...
  3. call TooJpeg::writeJpeg()
A basic program could look like this: A full example which creates example.jpg (click on "show" or download file):



The same example, but this time for grayscale images (download file):


Feature Comparison

toojpeg jp_jpeg jpge libjpeg-turbo
language C++11 C++ C++ C
free software yes yes yes yes
lines of code 666 336 1038 several thousand
no OS specific code
("is portable")
yes yes yes yes
(plus hand-coded assembler optimizations)
support YCbCr444 format yes yes yes yes
support YCbCr420 format yes no yes yes
support Y-only format (grayscale) yes no (always YCbCr444) yes yes
DCT data type floating-point floating-point integer floating-point and integer
adaptive Huffman codes no no yes (optional) yes (optional)
progressive JPEGs no no no yes
tons of other JPEG formats,
e.g. arithmetic coding
no no no yes
needs heap memory no no yes yes
All libraries compile "out-of-the-box" with GCC, CLang and Visual C++ (other compilers not tested / not available to me).

Benchmark

I downloaded a huge picture from NASA's Blue Marble web site (21600x10800 pixels, click on the preview image):



This TIFF image was converted to PPM (about 700 MByte) so that I could write a simple converter PPM → JPEG based on the three aforementioned libraries.
The tests ran on a Core i7 2600K:

Blue Marble, YCbCr444 (x64) (smaller is better/faster) 1s 2s 3s 4s 5s 6s 7s jo_jpeg jpge v1.0 v1.2 v1.3 v1.4/v1.5 toojpeg libjpeg-turbo Blue Marble, YCbCr420 (x64) (smaller is better/faster) 1s 2s 3s 4s 5s 6s 7s jo_jpeg jpge v1.0 v1.2 v1.3 v1.4/v1.5 toojpeg libjpeg-turbo Blue Marble, grayscale (x64) (smaller is better/faster) 1s 2s 3s 4s 5s 6s 7s jo_jpeg* jpge v1.0 v1.2 v1.3 v1.4/v1.5 toojpeg libjpeg-turbo
Core i7 quality=90, YCbCr444 quality=90, YCbCr420 quality=90, grayscale
toojpeg 1.4/1.5 3.6 seconds37,870,676 bytes64.8 MPixel/s 2.4 seconds27,790,721 bytes97.2 MPixel/s 1.4 seconds24,500,009 bytes166.6 MPixel/s
jo_jpeg 6.7 seconds37,869,843 bytes34.8 MPixel/s n/a 5.8 seconds28,143,622 bytes40.2 MPixel/s
jpge 5.5 seconds38,701,760 bytes42.4 MPixel/s 3.3 seconds28,329,269 bytes70.7 MPixel/s 2.1 seconds24,904,332 bytes111.1 MPixel/s
libjpeg-turbo 2.0.1 2.0 seconds38,115,886 bytes116.6 MPixel/s 1.2 seconds27,942,330 bytes194.4 MPixel/s 0.9 seconds24,589,888 bytes259.2 MPixel/s
Note: Edit June 2019: I was benchmarking a smaller version - with 8192x4096 pixels - of the Blue Marble image on a Raspberry Pi 3 B (2016 model) using the default Raspbian OS.
... and apparently, the Raspi's ARM chip doesn't like the toojpeg 1.3 RGB-2-YCbCr conversion routines :-(

While the desktop PC's x64 tests showed a significant performance improvement over version 1.2 (especially YCbCr420), all ARM/RaspberryPi numbers became worse.
Curiously, the initial version 1.0 of toojpeg turned out to be the best for YCbCr420.
toojpeg 1.4 fixed this issue (well, YCbCr444 is still ≈5% slower than version 1.0 but YCbCr420 became ≈10% faster) without any negative effects on x64 systems.
Please note that the toojpeg 1.4 Raspberry binary is several kB smaller than version 1.0, too (GCC 6.3: ≈19kB vs. ≈23kB).
It seems GCC6 was able to follow a more aggressive inlining strategy with toojpeg 1.0 compared to 1.3 (and 1.4).
Raspberry Pi 3 B
(32 bit mode)
quality=90, YCbCr444 quality=90, YCbCr420 quality=90, grayscale
toojpeg 1.0 5.2 seconds 6.5 MPixel/s 4.0 seconds 8.4 MPixel/s 2.0 seconds 16.8 MPixel/s
toojpeg 1.3 6.7 seconds5,694,094 bytes5.0 MPixel/s 4.5 seconds4,313,780 bytes7.5 MPixel/s 2.5 seconds3,815,017 bytes13.4 MPixel/s
toojpeg 1.4 5.5 seconds6.1 MPixel/s 3.6 seconds9.3 MPixel/s 2.0 seconds16.8 MPixel/s
jo_jpeg 6.7 seconds5,694,327 bytes5.0 MPixel/s n/a 6.0 seconds4,339,017 bytes5.6 MPixel/s
jpge 6.2 seconds5,815,778 bytes5.4 MPixel/s 4.1 seconds4,395,313 bytes8.2 MPixel/s 3.3 seconds4,021,433 bytes10.2 MPixel/s
libjpeg-turbo 1.5.1 3.0 seconds5,731,180 bytes11.2 MPixel/s 1.8 seconds4,336,043 bytes18.6 MPixel/s 1.1 seconds3,828,629 bytes30.5 MPixel/s
Edit August 2019: Ubuntu released a AArch64 version for the Raspberry Pi. While it's CPU is a 64 bit design, the Raspbian Linux still runs in emulated 32 bit mode.
And performance figures are completely different in AArch64 mode (and compiled with the newer G++ 8) !
The sudden "performance degradation" of toojpeg 1.3 on the Raspi is barely visible; each toojpeg version is faster than its predecessor in almost every aspect.

The table below shows the improvement of the 64 bit version compared to the same sources compiled in 32 bit mode.
Notice that the other JPEG libs became faster, too. Especially jpge enjoys a major performance boost and is pretty much as fast as the lastest toojpeg release.
Raspberry Pi 3 B
(AArch64 mode)
quality=90, YCbCr444 quality=90, YCbCr420 quality=90, grayscale
toojpeg 1.0 4.6 seconds13% faster7.3 MPixel/s 3.8 seconds5% faster8.8 MPixel/s 2.0 seconds(same speed)16.8 MPixel/s
toojpeg 1.3 4.9 seconds37% faster6.8 MPixel/s 3.6 seconds25% faster9.3 MPixel/s 2.0 seconds25% faster16.8 MPixel/s
toojpeg 1.4 3.8 seconds45% faster8.8 MPixel/s 2.6 seconds38% faster12.9 MPixel/s 1.4 seconds43% faster24.0 MPixel/s
jo_jpeg 5.0 seconds34% faster6.7 MPixel/s n/a 4.6 seconds30% faster7.3 MPixel/s
jpge 3.9 seconds59% faster8.6 MPixel/s 2.6 seconds58% faster12.9 MPixel/s 1.5 seconds120% faster22.4 MPixel/s
libjpeg-turbo 1.5.1 1.7 seconds76% faster19.7 MPixel/s 1.0 seconds80% faster33.6 MPixel/s 0.7 seconds57% faster47.9 MPixel/s

Binary size

Compiling a simple example program for x64:
toojpeg 1.3 jo_jpeg jpge libjpeg-turbo 2.0.1 (uncompressed)
preprocessor symbol
in example program
#define USE_TOOJPEG #define USE_JOJPEG #define USE_JPGE #define USE_LIBJPEG #define USE_RAW
G++ 8 -O3 -s 18,536 bytes 26,736 bytes 55,888 bytes 10,408 bytes plus lib 6,248 bytes
G++ 8 -Os -s 10,344 bytes 14,464 bytes 19,024 bytes 6,312 bytes plus lib 6,248 bytes
CLang++ 4.2 -O3 -s 12,952 bytes 16,712 bytes 28,504 bytes 6,800 bytes plus lib 5,128 bytes
CLang++ 4.2 -Os -s 11,160 bytes 13,992 bytes 20,424 bytes 6,800 bytes plus lib 5,112 bytes
Note:

Accuracy

In toojpeg 1.0 and 1.1, I used the RGB-to-YUV constants from ITU-R BT.601 (which have six decimal positions, e.g. listed on Wikipedia).
ITU-T T.871 (page 4) recommends only 4 decimal positions, just like the JFIF 1.02 specification (page 3).
The source code of libjpeg-turbo comes with a detailled explaination how to derive all these conversion constants and lists them with 9 decimal positions.
Older versions of libjpeg had just 5 decimals (which is more than enough for their 16 bit fixed-point arithmetic).
Practically speaking, visual differences are non-existing on small and medium-sized images.

In order to allow my library's output to be bit-identical to Jon's jo_jpeg, I decided to switch to the "5-digits" constants in version 1.2:

Another set of magic constants are the so-called eight "AAN Scaling Factors". They are part of the DCT and can be precomputed as follows:

AanScaleFactors[0] = 1
AanScaleFactors[k=0..7] = cos(k⋅π/16)√2

which is:
Each element of the luminance and chrominance quantization matrices is divided by the product of two AAN Scaling Factors and the number 8: Jon's code avoids multiplying by 8 and instead pre-multiplied each AAN Scaling Factor by √8 which mathematically correct.
However small rounding effects cause his constants to be slightly off on certain images (such as the Blue Marble).

If you replace my more accurate constants by his (see my code comments in line ≈520) then toojpeg's output becomes bitwise identical to jo_jpeg.
(note: this applies only to YCbCr444 images because his grayscale images contain useless Cb and Cr data which my library avoids)

Source Code

Click on the green bars to view my library's source code in your browser:
Download  toojpeg.h
Latest release: July 8, 2019, size: 3833 bytes, 62 lines

CRC32: 1ff1123b
MD5: 466423dc7f3dc59898a0af96998880f6
SHA1: d3248ae12e2b97eda2c4b0f3a83907b4697b9d74
SHA256:cf815d4ccc6a827c9de83d1151c1140d86e5888dfc5fb2c177a16465fcd983e3

Download  toojpeg.cpp
Latest release: July 8, 2019, size: 31.8 kBytes, 665 lines

CRC32: 0abd3ca4
MD5: 3ab482775f4687457d6e8a3a14474394
SHA1: 32f5456d3db1343b8677360f01e5e433caa41b5e
SHA256:d41b7f8469f1dd0341165294affafe138e2c6cb2ad5d15c1db3d4bb420966603

If you encounter any bugs/problems or have ideas for improving future versions, please write me an email: create@stephan-brumme.com

License

This code is licensed under the zlib License:
This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software. Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions: 1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. 2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software. 3. This notice may not be removed or altered from any source distribution.zlib License

Changelog

homepage