flexiGIF - Lossless GIF/LZW optimization
posted by Stephan Brumme,
updated
Quick Overview
flexiGIF shrinks GIF files by optimizing their compression scheme (LZW algorithm).No visual information is changed and the output is 100% pixel-wise identical to the original file - that's why it's called "lossless optimization".
And the results are still GIF files you can open/view with any standard tool.
Animated GIFs are supported as well.
Most files can be reduced by about 2%. I'm not aware of any open-source software outperforming flexiGIF.
The only downside: it may takes several seconds or even minutes for a medium-sized GIF image.
That's several magnitudes slower than other GIF encoders.
Your GIF decoder isn't affected at all - probably it even becomes faster because it has less input to process.
flexiGIF does NOT optimize palette, strip extensions and/or reduces color information.
There are many sophisticated tools out there, such as ImageMagick, that excel at that job.
flexiGIF is designed to be a tool used after you ran those image-processing programs.
Proposed toolchain:
- create your GIF
- run an image-optimizer, such as Gifsicle
- let flexiGIF optimize the LZW bitstream
flexiGIF is a command-line tool and can be easily used with custom scripts, etc.
Keep in mind that flexiGIF's compression is very (!) slow, magnitudes slower than a standard GIF encoder.
Lossless compression
Unlike the algorithms behind JPEG or PNG, there is is no adjustable compression level in LZ78/LZW compressed data.So I had to come up with something else ... and I choose not to play around with image specific optimizations.
Instead, I extract the LZW data stream from GIF files and recompress it.
flexiGIF combines two techniques invented by other people - the following ideas are not my original work (I wish they were ...).
To my knowledge, flexiGIF is the first free program to implement both.
How does it work ?
The LZW algorithm is more than 3 decades old - in 1984 Welch improved the LZ78 by Lempel and Ziv (obviously invented in 1978).Up to now, almost all GIF/LZW encoders work the same way (loosely based on Wikipedia article of GIF):
- fill a dictionary with all available symbols, typically that's 0..255
- find the longest string S in the dictionary matching the current input
- add S plus the next symbol to the dictionary, emit the dictionary index of S
- if the dictionary is full, then reset the dictionary (see step 1)
- go to step 2
Often it takes just a few lines of code to implement that algorithm.
At the moment, flexiGIF has >2,500 lines of code because it tweaks the algorithms in a few subtle ways ... read on !
Flexible Dictionary Reset
The GIF standard defines a maximum dictionary size of 2^12 = 4096 entries.Then a so-called clear code should be inserted into the bitstream in order to reset the dictionary to its initial state.
Almost all GIF encoders execute step 4 when the dictionary contains 4096 entries, a few do it a little bit earlier (4093, e.g. IrfanView).
However, the GIF standard also allows to "skip" step 4: if the dictionary is full then you should (not "you must" !) reset the dictionary.
It's explicitly written on the very first page of the GIF standard ("deferred clear code"):
https://www.w3.org/Graphics/GIF/spec-gif89a.txt
What does it mean ? If the dictionary is full, then you are not allowed to execute step 3 (add strings).
But you can still reference old dictionary entries, it becomes static.
If they still are good matches for the current input, then they can provide a nice compression ratio.
Moreover, they most likely exceed the compression ratio of a recently resetted dictionary.
On the other hand, a full dictionary requires longer codes (GIF: 12 bits).
flexiGIF runs a brute-force search to find the best spot to reset the dictionary:
- sometimes it makes sense to reset the dictionary after a few pixels (if pixels' colors change rapidly)
- sometimes the old dictionary is great for several thousand pixels/bytes
Flexible Matching
In the aforementioned algorithm, step 2 reads: "find the longest string".This isn't the whole truth, though. Almost all GIF encoders have so-called "greedy" match finders and actually look for the longest match.
But there are certain situations where the longest match is not the best match.
Let's consider a situation where the dictionary contains the strings
a
, b
, aa
, ab
, aaa
, abb
.If the input is
aaabb
, then a greedy matcher splits it into three groups (aaa
)(b
)(b
).A flexible, non-greedy matcher chooses (
aa
)(abb
) saving one token.It accepts a sub-optimal match for the first two bytes because in that case the next match will be much longer.
Since each match creates a new dictionary entry, too, flexible matching causes the LZW dictionary to contain duplicates:
if (
aa
) was chosen then (aa
)+a
= aaa
must be added to the dictionary even though it's already there.Quite often flexible matching saves a few bytes but unfortunately in a substantial number of cases the opposite is true:
the sub-optimal dictionary worsens the compression ratio.
The encoder's flexible matching is always slower than greedy matching. It doesn't make a difference during decoding, though.
Note: my flexible matcher is limited to two matches (one-step lookahead). It's not optimal parsing such as my smalLZ4 tool.
Previous Work
There are a few scientific papers about the described algorithms.In my eyes the most understandable document was written by Nayuki ( https://www.nayuki.io/page/gif-optimizer-java ).
It comes with full Java source code but it's extremely memory-hungry and quite slow. Only flexible dictionary reset is implemented.
Flexible matching is part of pretty much every practical LZ77 compression program, such as GZIP.
It's often called one-step lookahead.
Code
Everything was written in C++ from scratch. No external libraries are required.The code can be compiled with GCC, CLang and Visual C++.
I haven't tested flexiGIF on big-endian systems.
Command-line options
Usage:flexigif [options] INPUTFILE OUTPUTFILE
flexiGIF actually produces good results if you use no options at all (just the two filenames).
Most parameters have a short version (such as
-h
) and a long version (--help
) which is more readable.Multiple short versions can be merged if they have no numeric value, e.g.
-f -s -v
is the same as -fsv
.-h --help
Display help screen.
-p --prettygood
Uses a set of parameters that typically give the best results. Recommended for most inputs but may be REALLY slow.
-a=x --alignment=x
My brute-force search starts at every pixel which is a multiple of this value.
If
-a=1
then every possible block splitting is analyzed.Higher values speed up the program and reduce memory consumption (
-a=10
is often 10x faster).-d=x --dictionary=x
Maximum size of the LZW dictionary before a restart is forced.
Typically left unchanged unless you encounter compatibility issues with certain GIF decoders, then
-d=4093
may help (see -c
, too).-t=x --maxtokens=x
In order to speed up the brute-force search, a maximum number of matches per block can be defined.
It's extremely rare to find GIF images with more than 20000 matches before resetting the dictionary (=default value).
Often
-t=10000
is enough, too, and about twice as fast.-c --compatible
Try to emit output that respects bugs/limitations of certain GIF decoders.
At the moment,
-c
is equivalent to -d=4093
.-l --deinterlace
Deinterlace GIF images.
Note: not implemented for animated GIFs yet.
-g --greedy
Enable greedy match search (which is default behavior anyway)
-n=x --nongreedy=x
Enable non-greedy match search where x is the minimum match length to be considered.
Default is
-n=2
which means every match with at least two pixels/bytes are checked whether a non-greedy search gives better results.Higher values cause less duplicates in the dictionary which sometimes improves compression ratio.
-m=x --minimprovement=x
Minimum number of bytes saved by a non-greedy match (requires parameter
-n
, too).Default is
-m=1
-i --info
Analyze internal structure of
INPUTFILE
. No need provide OUTPUTFILE
because everything is written to STDOUT.It's more or less a debugging switch and the format of the displayed information may change at any time.
-f --force
Overwrite
OUTPUTFILE
if it already exists.-r --splitruns
Allow partial matching of long runs of the same byte (requires parameter
-n
).Many GIF files have large areas with the same color.
Non-greedy search often leads to very bad dictionaries when those "long runs" of the same color are split.
Enabling this option is useful only for very few images.
-u=x --userdefined=x
flexiGIF doesn't look for the best dictionary reset position - instead, you provide them !
x
is an ascendingly sorted list, e.g. -u=500,2000,9000
It's considered a debugging option.
-s --summary
When finished, compare filesize of
INPUTFILE
and OUTPUTFILE
.-v --verbose
Show debug messages. It's more or less a debugging switch and the format of the displayed information may change at any time.
-q --quiet
No output at all, except when an error occurred.
-Z
INPUTFILE
and OUTPUTFILE
are stored in .Z file format instead of GIF-y --immediately
Avoid initial dictionary reset (clear code) and start immediately with compressed data.
This options typically saves a byte.
However, a few popular GIF decoders can't properly handle GIF without an initial dictionary reset.
Limitations
My .Z encoder and decoder support only dictionary restarts if the current LZW code size is 16 bits.Code
(scroll down to download the source code - or click on the green bars to view the file in your browser)The compression engine doesn't know anything about GIFs - it only optimizes the LZW bitstream:
show
LzwEncoder.h
// //////////////////////////////////////////////////////////
// LzwEncoder.h
// Copyright (c) 2018 Stephan Brumme. All rights reserved.
// see https://create.stephan-brumme.com/flexigif-lossless-gif-lzw-optimization/
#pragma once
#include <vector>
#include <array>
using std::size_t;
/// compress data with LZW algorithm and a user-defined block-splitting algorithm
class LzwEncoder
{
public:
/// uncompressed data
typedef std::vector<unsigned char> RawData;
/// encoded/compressed data
typedef std::vector<bool> BitStream;
/// set uncompressed data
explicit LzwEncoder(const RawData& data, bool isGif = true);
/// optimization parameters
struct OptimizationSettings
{
/// bits per LZW code
unsigned char minCodeSize;
/// for compatibility: first LZW code sent to output is the clear code
bool startWithClearCode;
/// display internal stuff while compressing data
bool verbose;
/// if zero => greedy search else non-greedy search
bool greedy;
/// minimum match length considered for non-greedy search
unsigned int minNonGreedyMatch;
/// minimum number of bytes saved when a non-greedy match is used
unsigned int minImprovement;
/// maximum number of dictionary entries
unsigned int maxDictionary;
/// maximum number of tokens per block (huge values severely affect compression speed)
unsigned int maxTokens;
/// if true then non-greedy matching is allowed to analyze a string of identical bytes
bool splitRuns;
/// alignment
unsigned int alignment;
/// read-only access to m_best (gives small speed up for final pass when emitBitStream=true)
bool readOnlyBest;
/// don't recompute non-greedy infos when greedy search found no greedy matches
bool avoidNonGreedyAgain;
};
/// optimize a single block and update results in m_best
BitStream optimizePartial(unsigned int from, unsigned int maxLength, bool emitBitStream, bool isFinal, OptimizationSettings optimize);
/// determine best block boundaries based on results of optimizePartial() and call merge()
BitStream optimize(OptimizationSettings optimize);
/// optimize if block boundaries are known
BitStream merge(std::vector<unsigned int> restarts, OptimizationSettings optimize);
private:
/// add string at m_data[from...from+length] to the dictionary and return its code
int addCode (unsigned int from, unsigned int length);
/// return length of longest match, beginning at m_data[from], limited to maxLength
unsigned int findMatch(unsigned int from, unsigned int maxLength);
/// add bits to BitStream
static void add(BitStream& stream, unsigned int token, unsigned char numBits);
/// get minimum number of bits to represent token
static unsigned char getMinBits(unsigned int token);
// ----- debugging code -----
RawData debugDecode(int code) const;
int findCode (unsigned int from, unsigned int maxLength) const;
/// uncompressed data
RawData m_data;
// ----- temporary data structures -----
/// gather statistics of a single compressed block of data
struct BestBlock
{
/// current block: number of bytes (uncompressed input)
unsigned int length;
/// current block: number of bits of compressed output
unsigned int bits;
/// this plus all following blocks: total number of bits
unsigned long long totalBits;
/// current block: number of LZW codes
unsigned int tokens;
/// number of non-greedy matches
unsigned int nongreedy;
/// true if block's last match isn't greedy
bool partial;
BestBlock()
: length(0), bits(0), totalBits(0),
tokens(0), nongreedy(0), partial(false)
{}
};
/// all block of compressed data where m_best[x] is the optimum block (regarding all following blocks) which starts at input byte x
std::vector<BestBlock> m_best;
/// for each LZW code store the LZW codes of its children (which is the same as its parent plus one byte, -1 => undefined/no child)
typedef std::vector<std::array<int, 256> > Dictionary;
Dictionary m_dictionary;
/// number of valid entries in m_dictionary
unsigned int m_dictSize;
/// maximum number of codes in the dictionary
unsigned int m_maxDictionary;
/// GIF = 12, LZW = 16
unsigned char m_maxCodeLength;
/// true, if encode as GIF
bool m_isGif;
};
show
LzwEncoder.cpp
Actually, files needs to be decompressed first ...
// //////////////////////////////////////////////////////////
// LzwEncoder.cpp
// Copyright (c) 2018 Stephan Brumme. All rights reserved.
// see https://create.stephan-brumme.com/flexigif-lossless-gif-lzw-optimization/
// Nayuki implemented a similar encoder in Java two years earlier:
// https://www.nayuki.io/page/gif-optimizer-java
// (albeit much slower and without non-greedy parsing)
#include "LzwEncoder.h"
#define ALLOW_VERBOSE
#ifdef ALLOW_VERBOSE
#include <iostream>
#include <iomanip>
#endif
// local stuff
namespace
{
/// indicates an invalid entry in the dictionary
const int Unknown = -1;
}
/// set uncompressed data
LzwEncoder::LzwEncoder(const RawData& data, bool isGif)
: m_data(data),
m_dictionary(),
m_dictSize(0),
m_maxCodeLength(isGif ? 12 : 16),
m_maxDictionary(), // see a few lines below
m_isGif(isGif)
{
m_maxDictionary = (1 << m_maxCodeLength) - 1;
}
/// add string at m_data[from...from+length] to the dictionary and return its code
int LzwEncoder::addCode(unsigned int from, unsigned int length)
{
// first literal
int code = m_data[from++];
// walk through dictionary in order to find the code for the matching without the last byte
for (unsigned int i = 1; i < length; i++)
{
unsigned char oneByte = m_data[from++];
code = m_dictionary[code][oneByte];
}
// the new string is a known code plus a new byte (the last one)
if (from < m_data.size())
{
unsigned char lastByte = m_data[from];
// insert at the end of the dictionary
if (m_dictSize < m_maxDictionary)
{
// don't overwrite (needed for non-greedy algorithm where a code isn't unique)
if (m_dictionary[code][lastByte] == Unknown)
m_dictionary[code][lastByte] = m_dictSize;
m_dictSize++;
}
}
return code;
}
/// return length of longest match, beginning at m_data[from], limited to maxLength
unsigned int LzwEncoder::findMatch(unsigned int from, unsigned int maxLength)
{
// there is always a LZW code for the first byte
int code = m_data[from++];
// try to match second, third, ... byte, too
for (unsigned int length = 1; length < maxLength; length++)
{
unsigned char oneByte = m_data[from++];
code = m_dictionary[code][oneByte];
// no continuation => return number of matching byte
if (code == Unknown)
return length;
}
// matched all the way (quite unlikely, yet possible)
return maxLength;
}
// for debugging only, not used in release compilation
LzwEncoder::RawData LzwEncoder::debugDecode(int code) const
{
RawData result;
if (code == Unknown)
return result;
if (code >= (int)m_dictionary.size())
return result; // actually an error ...
// brute-force search ...
int search = code;
while (code > 255)
{
if (search > code)
search = code;
search--;
if (search < 0)
break;
for (unsigned int next = 0; next <= 255; next++)
if (m_dictionary[search][next] == code)
{
result.insert(result.begin(), char(next));
code = search;
break;
}
}
result.insert(result.begin(), char(code));
return result;
}
// for debugging only, not used in release compilation
int LzwEncoder::findCode(unsigned int from, unsigned int maxLength) const
{
// there is always a LZW code for the first byte
int code = m_data[from++];
// try to match second, third, ... byte, too
for (unsigned int length = 1; length < maxLength; length++)
{
unsigned char oneByte = m_data[from++];
int prevCode = code;
code = m_dictionary[code][oneByte];
// no continuation => return number of matching byte
if (code == Unknown)
return prevCode;
}
// matched all the way (quite unlikely, yet possible)
return code;
}
/// optimize a single block and update results in m_best
LzwEncoder::BitStream LzwEncoder::optimizePartial(unsigned int from, unsigned int maxLength, bool emitBitStream, bool isFinal, OptimizationSettings optimize)
{
// allocate memory
if (m_best.empty())
m_best.resize(m_data.size() / optimize.alignment + 1 + 1);
// length of current block
unsigned int length = (unsigned int)m_data.size() - from;
if (length > maxLength && maxLength != 0)
length = maxLength;
// LZW encoded data
BitStream result;
if (emitBitStream)
result.reserve(length * m_maxCodeLength); // reserve a bit too much ... just to avoid any reallocations
// index for m_best
if (optimize.alignment == 0)
optimize.alignment = 1;
unsigned int fromAligned = from / optimize.alignment;
#ifdef ALLOW_VERBOSE
if (from % optimize.alignment != 0)
{
std::cerr << "optimizePartial is not allowed to start at a non-aligned offset (" << from << ", alignment=" << optimize.alignment << ")" << std::endl;
throw "non-aligned block start is forbidden";
}
#endif
// no recomputation in second pass of --prettygood mode if previous non-greedy search was unsuccessful
if (optimize.greedy &&
optimize.avoidNonGreedyAgain &&
!emitBitStream &&
m_best[fromAligned].nongreedy == 0 &&
m_best[fromAligned].length > 0)
return result;
// special codes
const unsigned int clear = 1 << optimize.minCodeSize;
const unsigned int endOfStream = clear + 1;
// initialize dictionary
std::array<int, 256> children;
children.fill(Unknown);
m_dictionary.resize(m_maxDictionary);
for (unsigned int i = 0; i < m_maxDictionary; i++)
m_dictionary[i] = children;
if (m_isGif)
m_dictSize = clear + 2;
else
m_dictSize = clear + 1; // .Z format: no endOfStream
// initialize counters
unsigned int numBits = 0;
unsigned int numTokens = 0;
unsigned int numNonGreedyMatches = 0;
// total length of the current match, initially no match
unsigned int matchLength = 0;
// bits per code
unsigned char codeSize = getMinBits(m_dictSize);
// process input
unsigned int lastPos = from + length - 1;
for (unsigned int i = from; i <= lastPos; i++)
{
// number of already processed bytes
unsigned int numBytes = i - from + 1;
// ----- match finding -----
// need new match ?
if (matchLength == 0)
{
// if blocks become too large then it's quite unlikely to find a better compression
if (optimize.maxDictionary > 0 && m_dictSize >= optimize.maxDictionary)
break; // some broken decoders can't handle a full dictionary, too
if (optimize.maxTokens > 0 && numTokens >= optimize.maxTokens)
break;
// ----- the next 50+ lines to try optimize matchLength (greedy/non-greedy) -----
// total number of bytes left in the current block
unsigned int remaining = length + from - i;
// find longest match (greedy), must not exceed number of available bytes
matchLength = findMatch(i, remaining);
// non-greedy lookahead
bool tryNonGreedy = !optimize.greedy;
if (matchLength == 1 || matchLength < optimize.minNonGreedyMatch)
tryNonGreedy = false;
// non-greedy search doesn't make sense when we're very close to the end
if (i + matchLength + 4 >= m_data.size()) // the number 4 was chosen randomly ...
tryNonGreedy = false;
// flexible parsing
if (tryNonGreedy)
{
// don't split long runs of the same pixels
if (!optimize.splitRuns)// && matchLength >= 2)
{
// cheap check of first and last byte
unsigned int lastMatchByte = matchLength - 1;
bool allTheSame = (m_data[i] == m_data[i + lastMatchByte]);
// and the rest of them until a mismatch is found
for (unsigned int scan = 1; scan + 1 < lastMatchByte && allTheSame; scan++)
allTheSame = (m_data[i] == m_data[i + scan]);
// all pixels are identical ?
if (allTheSame)
tryNonGreedy = false;
}
// greedy matching after the current match
unsigned int second = findMatch(i + matchLength, remaining - matchLength);
// sum of these two greedy matches
unsigned int best = matchLength + second;
// look for an improvement
unsigned int atLeast = best + optimize.minImprovement;
// store length of currently best greedy/non-greedy match
unsigned int choice = matchLength;
// try all non-greedy lengths
for (unsigned int shorter = matchLength - 1; shorter > 0; shorter--)
{
// greedy match of everything that follows
unsigned int next = findMatch(i + shorter, remaining - shorter);
unsigned int sum = shorter + next;
// longer ?
if (sum >= atLeast && sum > best)
{
best = shorter + next;
choice = shorter;
}
}
// found a better sequence than greedy match ?
if (choice < matchLength)
{
#ifdef ALLOW_VERBOSE
/*if (emitBitStream && optimize.verbose)
std::cout << "use non-greedy @ " << i << " \t before " << matchLength << "+" << second << "=" << matchLength+second
<< "\tnow " << choice << "+" << (best - choice) << "=" << best
<< " \t => +" << (best - (matchLength + second))
<< std::endl;
*/
#endif
matchLength = choice;
numNonGreedyMatches++;
}
} // end of flexible parsing
// ----- LZW code generation -----
// need one more bit per code ?
if (m_dictSize < m_maxDictionary)
{
unsigned int threshold = m_dictSize - 1;
// detect powerOfTwo (see https://bits.stephan-brumme.com/isPowerOfTwo.html )
if ((threshold & (threshold - 1)) == 0 && codeSize < m_maxCodeLength) // but never use more than 12 (or 16) bits
{
codeSize++;
// undo if first token (.Z format)
if (!m_isGif && threshold == 256)
codeSize--;
}
}
// update dictionary
unsigned int code = addCode(i, matchLength);
// append code to LZW stream
if (emitBitStream)
add(result, code, codeSize);
// counters
numBits += codeSize;
numTokens++;
}
// "eat" next byte of the current match (until it's 0 and we look for the next match)
matchLength--;
// ----- cost evaluation -----
// cost already determined ?
if (optimize.readOnlyBest)
continue;
// don't update m_best if no block optimization started at the current slot
bool isLastByte = (i + 1 == m_data.size());
// look at compression result of the remaining bytes
unsigned int next = i + 1;
unsigned int nextAligned = next;
if (optimize.alignment > 1)
nextAligned = (next + optimize.alignment - 1) / optimize.alignment; // find current m_best index
if (!isLastByte && m_best[nextAligned].totalBits == 0)
continue;
// save cost information only on aligned addresses (except for the last bytes)
if (optimize.alignment > 1 && numBytes % optimize.alignment != 0)
{
//if (!isFinal)
//continue;
if (!isLastByte)
continue;
}
// assuming the block would end here, a few extra bits are needed
unsigned int add = codeSize; // clear / end-of-stream
// increase code size just for the clear / end-of-stream code ?
size_t threshold = m_dictSize - 1;
if ((threshold & (threshold - 1)) == 0 && codeSize < m_maxCodeLength)
add++;
if (!m_isGif)
{
// TODO: only allow restart if clear code can be encoded in 16 bits
if (!isLastByte && codeSize < 16)
continue;
// no endOfStream token in .Z file format
if (isLastByte)
add = 0;
// fill last byte
if (numBits % 8 != 0)
add += 8 - (numBits % 8);
// dictionary resets are followed by a bunch of zeros
if (!isLastByte)
{
unsigned int tokensPlusClear = numTokens + 1;
// compress' LZW must be aligned to 8 tokens
unsigned int mod8 = tokensPlusClear & 7;
unsigned int gap = mod8 == 0 ? 0 : 8 - mod8;
add += codeSize * gap;
}
}
// true if the we are currently "inside" a match
bool isPartial = (matchLength > 0);
// compute final cost
unsigned int trueBits = numBits + add;
unsigned long long totalBits = trueBits + m_best[nextAligned].totalBits;
// better path ? (or no path found at all so far)
BestBlock& best = m_best[fromAligned];
if (best.totalBits == 0 || best.totalBits >= totalBits)
{
best.bits = trueBits;
best.totalBits = totalBits;
best.length = numBytes;
best.tokens = numTokens;
best.partial = isPartial;
best.nongreedy = numNonGreedyMatches;
// note: if there are multiple paths with the same cost, then longer blocks are preferred
// to change this, replace "best.totalBits > totalBits"
// by "best.totalBits >= totalBits"
// but keep in mind that a dictionary restart costs CPU cycles during decoding
}
}
// done
if (emitBitStream)
{
// end of block: emit either clear or endOfStream code
codeSize = getMinBits(m_dictSize - 1);
if (m_isGif)
{
add(result, isFinal ? endOfStream : clear, codeSize);
}
else
{
// only clear code, no endOfStream
if (!isFinal)
{
add(result, clear, codeSize);
numTokens++;
}
// fill current byte
while (result.size() % 8 != 0)
{
result.push_back(0);
numBits++;
}
// a block's number of token has to be a multiple of 8
if (!isFinal)
{
if (codeSize != 16)
{
#ifdef ALLOW_VERBOSE
std::cerr << "LZW code size is " << (int)codeSize << std::endl;
throw "flexiGIF currently only supports restarts with code size = 16";
#endif
//throw "experimental code: currently only 16 bit clear codes supported";
// some experimental values for numZeros if codeSize < 16:
// 20/9 => 3
// 21/9 => 2
// 22/9 => 1
// 30/9 => 2
// 31/9 => 1
// 50/9 => 3
}
// same formulas as in LzwDecoder
unsigned int mod8 = numTokens & 7;
unsigned int gap = mod8 == 0 ? 0 : 8 - mod8;
// add zeros
unsigned int numZeros = codeSize * gap / 8;
#ifdef ALLOW_VERBOSE
//std::cout << "pad " << numZeros << " zeros" << std::endl;
#endif
for (unsigned int bit = 0; bit < 8*numZeros; bit++)
result.push_back(0);
}
}
}
// error checking
if (m_best[fromAligned].length == 0)
{
#ifdef ALLOW_VERBOSE
//std::cerr << << std::endl;
//throw "current block didn't yield any optimization infos";
#endif
}
return result;
}
/// determine best block boundaries based on results of optimizePartial() and call merge()
LzwEncoder::BitStream LzwEncoder::optimize(OptimizationSettings optimize)
{
// find shortest path
unsigned int pos = 0;
unsigned int aligned = 0;
std::vector<unsigned int> restarts;
while (pos < m_data.size())
{
unsigned int length = m_best[aligned].length;
// no continuation ?
if (length == 0)
{
#ifdef ALLOW_VERBOSE
if (optimize.verbose)
std::cerr << "gap @ " << pos << " (aligned=" << aligned << ")" << std::endl;
#endif
throw "you need to choose a smaller alignment value or increase the token limit because there is a gap between two blocks";
}
pos += length;
aligned = pos / optimize.alignment;
restarts.push_back(pos);
}
// LZW compress along the shortest path
return merge(restarts, optimize);
}
/// optimize if block boundaries are known
LzwEncoder::BitStream LzwEncoder::merge(std::vector<unsigned int> restarts, OptimizationSettings optimize)
{
// final result
BitStream result;
result.reserve(m_data.size() * 3); // basic heuristic to estimate memory consumption
// optional: prepend clear code
if (optimize.startWithClearCode && m_isGif)
{
// it's 2^minCodeSize which is a bunch of zeros followed by a one
for (unsigned int i = 0; i < optimize.minCodeSize; i++)
result.push_back(false);
result.push_back(true);
}
// check number of restarts
if (restarts.empty())
return result;
if (restarts.back() < m_data.size())
restarts.push_back((unsigned int)m_data.size());
// statistics
std::vector<unsigned int> sizes;
unsigned int totalSize = 0;
// verbose mode displays garbage if using predefined block sizes
bool verbose = optimize.verbose;
if (m_best.empty() || m_best[0].bits == 0)
verbose = false;
unsigned int pos = 0;
for (size_t i = 0; i < restarts.size(); i++)
{
// dummy entry for new block at pos 0
if (restarts[i] == 0)
continue;
bool isFinal = (i == restarts.size() - 1);
unsigned int length = restarts[i] - pos;
// switch to faster settings
if (!m_best.empty())
{
optimize.greedy = (m_best[pos / optimize.alignment].nongreedy == 0); // speed-up if non-greedy search was enabled but not successful in this block
if (optimize.greedy)
optimize.avoidNonGreedyAgain = true;
}
optimize.readOnlyBest = true;
// compute current block
BitStream block = optimizePartial(pos, length, true, isFinal, optimize);
if (block.empty() && length > 0)
{
std::cerr << "internal error, block @ " << pos << " with length " << length << " is empty" << std::endl;
throw "optimization failed due to an internal error";
}
// add to previous stuff
result.insert(result.end(), block.begin(), block.end());
unsigned int current = (unsigned int)block.size();
sizes.push_back(current);
totalSize += current;
#ifdef ALLOW_VERBOSE
if (verbose)
{
unsigned int aligned = (pos + optimize.alignment - 1) / optimize.alignment;
BestBlock block = m_best[aligned];
const char* pixel = m_isGif ? "pixel" : "byte";
std::cout << "cost @ " << pos << " \t=> bits=" << sizes[i] << " \t" << pixel << "s=" << block.length << "\ttokens=" << block.tokens
<< "\tbits/" << pixel << "=" << float(sizes[i]) / block.length
//<< "\testimated_bits=" << block.bits << " "
<< (block.bits == sizes[i] ? "" : "?")
<< "\tnon-greedy=" << block.nongreedy
<< (block.partial ? ", last match is partial" : "")
<< std::endl;
}
#endif
pos = restarts[i];
}
#ifdef ALLOW_VERBOSE
if (optimize.verbose)
std::cout << "finished, now " << result.size() << " LZW bits = " << (result.size() + 7) / 8 << " bytes" << std::endl;
#endif
return result;
}
/// add bits to BitStream
void LzwEncoder::add(BitStream& stream, unsigned int token, unsigned char numBits)
{
while (numBits-- > 0)
{
stream.push_back((token & 1) != 0);
token >>= 1;
}
}
/// get minimum number of bits to represent token
unsigned char LzwEncoder::getMinBits(unsigned int token)
{
unsigned char result = 0;
do
{
token >>= 1;
result++;
} while (token != 0);
return result;
}
show
LzwDecoder.h
// //////////////////////////////////////////////////////////
// LzwDecoder.h
// Copyright (c) 2018 Stephan Brumme. All rights reserved.
// see https://create.stephan-brumme.com/flexigif-lossless-gif-lzw-optimization/
#pragma once
#include "BinaryInputBuffer.h"
#include <vector>
#include <string>
/// decode a GIF file, support for writing the same file with optimized LZW bitstream
/** errors throw an exception (const char*) **/
class LzwDecoder
{
public:
// -------------------- data types --------------------
/// a continuous block of bytes
typedef std::vector<unsigned char> Bytes;
/// parse LZW bitstream
explicit LzwDecoder(BinaryInputBuffer& input, bool isGif = true, unsigned char minCodeSize = 8, unsigned char maxCodeSize = 12, unsigned int expectedNumberOfBytes = 16*1024);
/// return minimum LZW code size
unsigned char getCodeSize() const;
/// return decompressed data
const Bytes& getBytes() const;
/// for statistics only: true number of compressed bits
unsigned int getNumCompressedBits() const;
/// show debug output
static bool verbose;
private:
/// decompress data, first parameter is a hint to avoid memory reallocations, GIFs are limited to code size 12, .Z => 16
void decompress(unsigned int expectedNumberOfBytes, unsigned char minCodeSize, unsigned char maxCodeSize);
/// read bits from current LZW block, start with next block if needed
unsigned int getLzwBits(unsigned char numBits);
/// tree node for an LZW code: its parent code and its last byte
struct BackReference
{
int length; // match length
int previous; // code of parent (contains everything except for the last byte), if negative then there is no parent
unsigned char last; // last byte of the match
int pos; // position of a match in the output stream (often there are several candidates)
};
/// convert code to bytes and store in buffer
void decode(Bytes& buffer, unsigned int code, const std::vector<BackReference>& lut) const;
/// read file bit-by-bit
BinaryInputBuffer& m_input;
/// decoded file (pixels/bytes)
Bytes m_bytes;
/// if true, then GIF's LZW data is grouped in blocks of 255 bytes each
bool m_isGif;
/// minimum bits per LZW code
unsigned char m_codeSize;
/// LZW data is split in blocks, keep track of the remaining bits in the current block
unsigned int m_bitsLeftInCurrentBlock;
/// total number of bits of original LZW compressed image (without block lengths)
unsigned int m_numBitsOriginalLZW;
};
show
LzwDecoder.cpp
GIF-related stuff:
// //////////////////////////////////////////////////////////
// LzwDecoder.cpp
// Copyright (c) 2018 Stephan Brumme. All rights reserved.
// see https://create.stephan-brumme.com/smallzwflexigif-lossless-gif-lzw-optimization/
#include "LzwDecoder.h"
#include <fstream>
#define ALLOW_VERBOSE
#ifdef ALLOW_VERBOSE
#include <iostream>
#include <iomanip>
#endif
namespace
{
/// placeholder for "no parent code"
const int NoPrevious = -1;
/// GIFs have a valid end-of-stream token but not compress' LZW variant
const unsigned int NoEndOfStream = 0xFFFFFFFF;
}
// statics
bool LzwDecoder::verbose = false;
/// parse LZW bitstream
LzwDecoder::LzwDecoder(BinaryInputBuffer& input, bool isGif,
unsigned char minCodeSize, unsigned char maxCodeSize,
unsigned int expectedNumberOfBytes)
: m_input(input),
m_bytes(),
m_isGif(isGif),
m_codeSize(0),
m_bitsLeftInCurrentBlock(0),
m_numBitsOriginalLZW(0)
{
decompress(expectedNumberOfBytes, minCodeSize, maxCodeSize);
}
/// return minimum LZW code size
unsigned char LzwDecoder::getCodeSize() const
{
return m_codeSize;
}
/// return decompressed data
const LzwDecoder::Bytes& LzwDecoder::getBytes() const
{
return m_bytes;
}
/// for statistics only: true number of compressed bits
unsigned int LzwDecoder::getNumCompressedBits() const
{
return m_numBitsOriginalLZW;
}
/// convert code to bytes and store in buffer
void LzwDecoder::decode(Bytes& buffer, unsigned int code, const std::vector<BackReference>& lut) const
{
// single byte: exists in lookup table only
if (lut[code].length == 1)
{
buffer.push_back(lut[code].last);
return;
}
// old iterative decoder
int length = lut[code].length;
// resize buffer
buffer.resize(buffer.size() + length);
unsigned char* pos = &buffer[buffer.size() - 1];
// copy bytes while walking backwards
while (length-- > 0)
{
*pos-- = lut[code].last;
code = lut[code].previous;
}
}
/// decompress data, first parameter is a hint to avoid memory reallocations, GIFs are limited to code size 12, .Z => 16
void LzwDecoder::decompress(unsigned int expectedNumberOfBytes, unsigned char minCodeSize, unsigned char maxCodeSize)
{
// initial bits per token
m_codeSize = minCodeSize;
#ifdef ALLOW_VERBOSE
if (verbose && m_isGif)
std::cout << ", " << (int)minCodeSize << " bits" << std::endl;
const char* pixel = m_isGif ? "pixel" : "byte";
#endif
// special codes
const unsigned int clear = 1 << minCodeSize;
const unsigned int endOfStream = m_isGif ? clear + 1 : NoEndOfStream;
const unsigned int maxColor = clear - 1;
const unsigned int MaxToken = 1 << maxCodeSize;
/// temporary look-up table for decompression
std::vector<BackReference> lut(m_isGif ? clear + 2 : clear + 1);
lut.reserve(MaxToken);
// set initial contents
for (unsigned int i = 0; i <= maxColor; i++)
{
lut[i].previous = NoPrevious;
lut[i].last = (unsigned char)i;
lut[i].length = 1;
lut[i].pos = NoPrevious;
}
unsigned char codeSize = minCodeSize + 1;
m_bitsLeftInCurrentBlock = 0;
// delete old data
m_bytes.clear();
m_bytes.reserve(expectedNumberOfBytes);
// pass through first token
int prevToken = NoPrevious;
int token = getLzwBits(codeSize);
while (token == clear)
token = getLzwBits(codeSize);
if (token >= (int)lut.size())
{
#ifdef ALLOW_VERBOSE
std::cerr << "found initial token " << token << " but only " << lut.size() << " dictionary entries" << std::endl;
#endif
throw "invalid token";
}
if (token != endOfStream)
m_bytes.push_back((unsigned char)token);
// counters
unsigned int numTokensTotal = 1;
unsigned int numTokensBlock = 1;
unsigned int numBitsBlock = codeSize;
unsigned int numBitsTotal = codeSize;
unsigned int uptoLastBlock = 0;
while (token != endOfStream)
{
// one more bit per code ?
unsigned int powerOfTwo = 1 << codeSize;
if (lut.size() == powerOfTwo && codeSize < maxCodeSize)
codeSize++;
// quick hack: compress' LZW algorithm doesn't have an end-of-file code
if (!m_isGif && codeSize > m_input.getNumBitsLeft())
break; // abort if not enough bits left
// next token
prevToken = token;
token = getLzwBits(codeSize);
if (token > (int)lut.size())
{
#ifdef ALLOW_VERBOSE
std::cerr << "found token " << token << " (" << (int)codeSize << " bits, "
<< pixel << " " << m_bytes.size() << ") but only " << lut.size() << " dictionary entries" << std::endl;
#endif
throw "invalid token";
}
// counters
numTokensTotal++;
numTokensBlock++;
numBitsBlock += codeSize;
numBitsTotal += codeSize;
// reset dictionary ?
bool reset = false;
while (token == clear)
{
#ifdef ALLOW_VERBOSE
if (verbose)
{
std::cout << "restart @ " << std::setw(7) << m_bytes.size()
<< " \tbits=" << std::setw(8) << numBitsTotal << " +" << numBitsBlock
<< std::setprecision(3) << std::fixed
<< " \tbits/" << pixel << "=" << numBitsBlock / float(m_bytes.size() - uptoLastBlock)
<< " \ttokens=+" << numTokensBlock << "/" << numTokensTotal
<< " \tdict=" << std::setw(4) << lut.size() << std::endl;
}
#endif
// delete all codes with 2+ bytes
if (m_isGif)
{
lut.resize(clear + 2);
}
else
{
// one entry less => no end-of-file token
lut.resize(clear + 1);
// bits leftover in the current byte must be ignored
if (m_numBitsOriginalLZW % 8 != 0)
{
unsigned int skipBits = 8 - (m_numBitsOriginalLZW % 8);
getLzwBits(skipBits);
}
// continue with next byte, remove "garbage" bytes, too
// a block's number of token has to be a multiple of 8
unsigned int mod8 = numTokensBlock & 7;
unsigned int gap = mod8 == 0 ? 0 : 8 - mod8;
// the following does the same computation in one line (I like that weird bit magic ...)
//unsigned int gap = (-(int)numTokensBlock) & 7; // from https://www.ioccc.org/2015/mills2/hint.html
while (gap-- > 0)
m_input.getBits(codeSize); // remember, a token takes 16 bits when approaching the LZW restart
// by the way: the GZIP sources have this formula (it doesn't compute the offset directly, though, and accounts for cases where <=16 bits/token):
//posbits = ((posbits-1) + ((n_bits<<3)-(posbits-1+(n_bits<<3))%(n_bits<<3)));
}
// reset code size
codeSize = minCodeSize + 1;
// fetch first token
prevToken = NoPrevious;
token = getLzwBits(codeSize);
// counters
numTokensTotal++;
numTokensBlock = 1;
numBitsBlock = codeSize;
numBitsTotal += codeSize;
uptoLastBlock = (unsigned int)m_bytes.size();
// copy first token to output
if (token > (int)maxColor)
throw "block starts with an undefined value/color";
m_bytes.push_back((unsigned char)token);
reset = true;
}
if (reset)
continue;
// done ?
if (token == endOfStream)
break;
// new LZW code
BackReference add;
add.previous = prevToken;
add.length = lut[prevToken].length + 1;
add.pos = (unsigned int)m_bytes.size();
// look up token in dictionary
if (token >= (int)lut.size())
{
// broken stream ?
if (token != lut.size())
throw "unknown token found";
if (lut.size() >= MaxToken)
throw "dictionary too large";
// unknown token:
// output LAST + LAST[0]
// add LAST + LAST[0]
decode(m_bytes, prevToken, lut);
add.last = m_bytes[add.pos];
m_bytes.push_back(add.last);
}
else
{
// known token:
// output TOKEN
// add LAST + TOKEN[0]
// add new chunk to table (but no more than 2^12)
decode(m_bytes, token, lut);
add.last = m_bytes[add.pos];
}
// add LZW code to the dictionary:
// the if-condition is required in case the dictionary is full and there hasn't been a clear-token
if (lut.size() < MaxToken)
lut.push_back(add);
}
#ifdef ALLOW_VERBOSE
if (verbose)
std::cout << "finish @ " << std::setw(7) << m_bytes.size()
<< " \tbits=" << std::setw(8) << numBitsTotal << " +" << numBitsBlock
<< std::setprecision(3) << std::fixed
<< " \tbits/" << pixel << "=" << std::setw(4) << numBitsBlock / float(m_bytes.size() - uptoLastBlock)
<< " \ttokens=+" << numTokensBlock << "/" << numTokensTotal
<< " \tdict=" << std::setw(4) << lut.size() << std::endl;
#endif
// skip remaining bits
unsigned int unusedBits = m_bitsLeftInCurrentBlock;
// process files where a few bytes are wasted
if (unusedBits >= 8)
{
#ifdef ALLOW_VERBOSE
std::cerr << "too many bits left in current block (" << unusedBits << ")" << std::endl;
#endif
while (unusedBits > 8)
{
getLzwBits(8);
unusedBits -= 8;
}
}
// ... and now actually skip the unused bits
token = getLzwBits(unusedBits); // TODO: isn't it always 0 for non-GIFs ?
// GIF only: a zero-sized block must follow
if (m_isGif)
{
if (m_input.getByte() != 0)
throw "LZW data is not properly terminated";
}
// don't include the "garbage" bits in the final count
m_numBitsOriginalLZW -= unusedBits;
// okay, we're done !
}
/// read bits from current LZW block, start with next block if needed
unsigned int LzwDecoder::getLzwBits(unsigned char numBits)
{
// degenerated case
if (numBits == 0)
return 0;
m_numBitsOriginalLZW += numBits;
// compress has a simple LZW format, just read the bits ...
if (!m_isGif)
return m_input.getBits(numBits);
// ... but GIF encapsulates the data in blocks of at most 255, each block is preceded by a length byte
// enough bits available in the current block ?
if (numBits <= m_bitsLeftInCurrentBlock)
{
m_bitsLeftInCurrentBlock -= numBits;
return m_input.getBits(numBits);
}
// cross block border
unsigned int low = 0;
unsigned char shift = 0;
// fetch all bits from current block ("low" bits of the token)
if (m_bitsLeftInCurrentBlock > 0)
{
low = m_input.getBits(m_bitsLeftInCurrentBlock);
shift = m_bitsLeftInCurrentBlock;
numBits -= m_bitsLeftInCurrentBlock;
}
// read size of new block
m_bitsLeftInCurrentBlock = 8 * m_input.getByte();
if (m_bitsLeftInCurrentBlock < numBits)
throw "too few bits available in unlzw";
// and get remaining bits ("high" bits)
unsigned int high = m_input.getBits(numBits) << shift;
m_bitsLeftInCurrentBlock -= numBits;
return low | high;
}
show
GifImage.h
// //////////////////////////////////////////////////////////
// GifImage.h
// Copyright (c) 2018 Stephan Brumme. All rights reserved.
// see https://create.stephan-brumme.com/flexigif-lossless-gif-lzw-optimization/
#pragma once
#include "BinaryInputBuffer.h"
#include <vector>
#include <string>
#include <utility> // for std::pair
/// decode a GIF file, support for writing the same file with optimized LZW bitstream
/** errors throw an exception (const char*) **/
class GifImage
{
public:
// -------------------- data types --------------------
/// a continuous block of bytes
typedef std::vector<unsigned char> Bytes;
/// RGB color
struct Color
{
unsigned char red;
unsigned char green;
unsigned char blue;
};
/// extension IDs
enum ExtensionType
{
PlainText = 0x01,
GraphicControl = 0xF9,
Comment = 0xFE,
Application = 0xFF
};
/// a single frame
struct Frame
{
/// simple copy of frame's header
Bytes rawHeader;
/// extensions
std::vector<std::pair<ExtensionType, Bytes> > extensions;
/// each frame's bits per token
unsigned char codeSize;
/// pixels / indices
Bytes pixels;
/// frame's upper left corner (relative to the global image)
unsigned int offsetLeft;
unsigned int offsetTop;
/// frame size
unsigned int width;
unsigned int height;
/// true if colors are sorted
bool isSorted;
/// true if interlaced
bool isInterlaced;
/// file position of interlaced flag (TODO: my code's broken for animated GIFs)
unsigned int posInterlaced;
/// local color map, stored
std::vector<Color> localColorMap;
/// original LZW size (in bits)
unsigned int numLzwBits;
};
// -------------------- methods --------------------
/// load file
explicit GifImage(const std::string& filename);
/// number of frames (or 1 if not animated)
unsigned int getNumFrames() const;
/// return decompressed data (indices for local/global color map)
const GifImage::Frame& getFrame(unsigned int frame = 0) const;
/// color depth (bits per pixel)
unsigned char getColorDepth() const;
/// replace LZW data with optimized data and write to disk (bitDepth = 0 means "take value from m_colorDepth")
unsigned int writeOptimized(const std::string& filename, const std::vector<std::vector<bool> >& bits, unsigned char bitDepth = 0);
/// convert from non-interlaced to interlaced (and vice versa)
void setInterlacing(bool makeInterlaced);
/// for debugging only: store image data in PPM format
bool dumpPpm(const std::string& filename, unsigned int frame = 0) const;
/// for debugging only: store indices
bool dumpIndices(const std::string& filename, unsigned int frame = 0) const;
/// show debug output
static bool verbose;
private:
/// read signature GIF 87a/89a
void parseSignature();
/// global image parameters (constant for all frame)
void parseGlobalDescriptor();
/// GIF extensions (e.g. animation settings)
void parseExtensions (Frame& frame);
/// local image parameters
void parseLocalDescriptor(Frame& frame);
/// final bytes of the image
void parseTerminator();
/// read 16 bits from m_input, little endian
unsigned short getWord();
/// the header will remain untouched
Bytes m_rawHeader;
/// the last byte will remain untouched, too
Bytes m_rawTrailer; // contains just one byte, it's always 0x3B
/// file version ("GIF87a" or "GIF89a")
std::string m_version;
/// width (in pixels)
unsigned int m_width;
/// height (in pixels)
unsigned int m_height;
/// bits per color
unsigned char m_colorDepth;
/// true if colors are sorted
bool m_isSorted;
/// number of colors
unsigned int m_sizeGlobalColorMap;
/// palette index of background color
unsigned char m_backgroundColor;
/// aspect ratio
unsigned char m_aspectRatio;
/// true, if animated
bool m_isAnimated;
/// global color map
std::vector<Color> m_globalColorMap;
/// simple wrapper to read file bit-wise
BinaryInputBuffer m_input;
/// decompressed frames (indices for local/global color map)
std::vector<Frame> m_frames;
};
show
GifImage.cpp
And the same for .Z file format:
// //////////////////////////////////////////////////////////
// GifImage.cpp
// Copyright (c) 2018 Stephan Brumme. All rights reserved.
// see https://create.stephan-brumme.com/smallzwflexigif-lossless-gif-lzw-optimization/
// official GIF spec: https://www.w3.org/Graphics/GIF/spec-gif89a.txt
#include "GifImage.h"
#include "LzwDecoder.h"
#include <fstream>
#define ALLOW_VERBOSE
#ifdef ALLOW_VERBOSE
#include <iostream>
#include <iomanip>
#endif
// statics
bool GifImage::verbose = false;
/// load file
GifImage::GifImage(const std::string& filename)
: m_rawHeader(),
m_rawTrailer(),
m_version(),
m_width(0),
m_height(0),
m_colorDepth(0),
m_isSorted(false),
m_sizeGlobalColorMap(0),
m_backgroundColor(0),
m_aspectRatio(0),
m_isAnimated(false),
m_globalColorMap(),
m_input(filename),
m_frames()
{
try
{
if (m_input.empty())
throw "file not found or empty";
// parse header
parseSignature();
parseGlobalDescriptor();
#ifdef ALLOW_VERBOSE
if (verbose)
std::cout << "'" << filename << "' image size " << m_width << "x" << m_height << ", " << (1 << m_colorDepth) << " colors" << std::endl;
#endif
// copy global header
size_t numBytesHeader = m_input.getNumBytesRead();
std::ifstream rawFile(filename.c_str(), std::ios::in | std::ios::binary);
m_rawHeader.resize(numBytesHeader);
rawFile.read((char*)&m_rawHeader[0], numBytesHeader);
unsigned int totalLzwBits = 0;
// decompress LZW
while (true)
{
unsigned int bytesReadSoFar = m_input.getNumBytesRead();
// technically it's impossible to encounter the end-of-file marker before the first frame
unsigned char marker = m_input.peekBits(8);
if (marker == 0x3B)
break;
#ifdef ALLOW_VERBOSE
if (verbose)
std::cout << "decompress frame " << (m_frames.size() + 1) << ": ";
#endif
Frame frame;
// parse frame header
parseExtensions (frame);
parseLocalDescriptor(frame);
// and copy frame header
unsigned int frameHeaderSize = m_input.getNumBytesRead() - bytesReadSoFar;
frame.rawHeader.resize(frameHeaderSize);
rawFile.seekg(bytesReadSoFar);
rawFile.read((char*)&frame.rawHeader[0], frameHeaderSize);
// decode LZW stream
unsigned char minCodeSize = m_input.getByte();
unsigned char maxCodeSize = 12; // constant value according to spec
LzwDecoder::verbose = GifImage::verbose;
LzwDecoder lzw(m_input, true, minCodeSize, maxCodeSize, m_width * m_height);
frame.pixels = lzw.getBytes();
frame.codeSize = lzw.getCodeSize();
totalLzwBits += lzw.getNumCompressedBits();
frame.numLzwBits = lzw.getNumCompressedBits();
// yeah, finished another frame ...
m_frames.push_back(frame);
}
// read/copy the last bytes of the file, too
m_rawTrailer = { (unsigned char)m_input.peekBits(8) };
parseTerminator();
if (!m_input.empty() != 0)
throw "there is still some data left ...";
#ifdef ALLOW_VERBOSE
if (verbose)
{
const char* frames = (m_frames.size() == 1) ? "frame" : "frames";
std::cout << m_frames.size() << " " << frames << ", " << totalLzwBits << " bits, " << m_frames.front().pixels.size() << " pixels plus " << numBytesHeader << " header bytes" << std::endl;
}
#endif
}
catch (const char* e)
{
#ifdef ALLOW_VERBOSE
std::cout << "ERROR: " << (e ? e : "(unknown exception)") << std::endl;
#else
throw;
#endif
}
}
/// number of frames (or 1 if not animated)
unsigned int GifImage::getNumFrames() const
{
return (unsigned int)m_frames.size();
}
/// return decompressed data (indices for local/global color map)
const GifImage::Frame& GifImage::getFrame(unsigned int frame) const
{
if (frame >= m_frames.size())
throw "invalid frame number";
return m_frames.at(frame);
}
/// color depth (bits per pixel)
unsigned char GifImage::getColorDepth() const
{
return m_colorDepth;
}
/// replace LZW data with optimized data and write to disk (bitDepth = 0 means "take value from m_colorDepth")
unsigned int GifImage::writeOptimized(const std::string& filename, const std::vector<std::vector<bool> >& bits, unsigned char bitDepth)
{
std::ofstream file(filename.c_str(), std::ios::out | std::ios::binary);
// write original header
file.write((char*)&m_rawHeader[0], m_rawHeader.size());
// bits per pixel/code
if (bitDepth == 0)
bitDepth = m_colorDepth;
for (unsigned int frame = 0; frame < bits.size(); frame++)
{
// frame header
file.write((char*)&m_frames[frame].rawHeader[0], m_frames[frame].rawHeader.size());
// minCodeSize
file << m_frames[frame].codeSize;
// iterate over bits
const std::vector<bool>& current = bits[frame];
unsigned int pos = 0;
while (pos < current.size())
{
// each block contains at most 255 bytes (=255*8 bits)
unsigned int bitsCurrentBlock = (unsigned int)current.size() - pos;
const unsigned int MaxBitsPerBlock = 255*8;
if (bitsCurrentBlock > MaxBitsPerBlock)
bitsCurrentBlock = MaxBitsPerBlock;
// size in bytes, round up
unsigned char blockSize = (bitsCurrentBlock + 7) / 8;
file << blockSize;
// merge single bits, write to disk
for (unsigned int i = 0; i < blockSize; i++)
{
// create one byte
unsigned char oneByte = 0;
for (unsigned char bit = 0; bit < 8 && pos < current.size(); bit++, pos++)
if (current[pos])
oneByte |= 1 << bit;
// and write to disk
file << oneByte;
}
}
// add an empty block after each image
file << (unsigned char) 0;
}
// write terminator
file.write((char*)&m_rawTrailer[0], m_rawTrailer.size());
// and we're done
unsigned int filesize = (unsigned int)file.tellp();
file.close();
return filesize;
}
/// for debugging only: store image data in PPM format
bool GifImage::dumpPpm(const std::string& filename, unsigned int frame) const
{
const Frame& current = m_frames.at(frame);
if (current.width != m_width ||
current.height != m_height)
throw "PPM for partial frames not supported yet";
// header
std::ofstream file(filename.c_str(), std::ios::out | std::ios::binary);
if (!file)
return false;
file << "P6\n"
<< m_width << " " << m_height << "\n"
<< "255\n";
// color mapping for current frame
std::vector<Color> colorMap = m_globalColorMap;
for (unsigned int i = 0; i < current.localColorMap.size(); i++)
colorMap[i] = current.localColorMap[i];
// convert indices to RGB
const Bytes& pixels = current.pixels;
for (unsigned int i = 0; i < pixels.size(); i++)
{
Color color = colorMap[pixels[i]];
file << color.red
<< color.green
<< color.blue;
}
return file.good();
}
/// for debugging only: store indices
bool GifImage::dumpIndices(const std::string& filename, unsigned int frame) const
{
const Frame& current = m_frames.at(frame);
if (current.width != m_width ||
current.height != m_height)
throw "dumping indices of partial frames not supported yet";
// current.pixels are a consecutive memory block, just dump to disk
std::ofstream file(filename.c_str(), std::ios::out | std::ios::binary);
file.write((const char*)¤t.pixels[0], current.pixels.size());
return file.good();
}
/// convert from non-interlaced to interlaced (and vice versa)
void GifImage::setInterlacing(bool makeInterlaced)
{
if (m_frames.size() != 1)
throw "code doesn't work yet for animated GIFs";
unsigned int posInterlaced = m_frames.front().posInterlaced;
if (posInterlaced == 0)
throw "interlaced bit not found";
if (m_isAnimated)
throw "interlacing in animation not supported yet";
// interlacing doesn't matter for a single line
if (m_height <= 1)
return;
const unsigned char mask = 0x40;
bool isInterlaced = (m_rawHeader[posInterlaced] & mask) != 0;
// keep current interlacing mode ?
if (isInterlaced == makeInterlaced)
return;
// line order:
// A) every 8th row, beginning with 0th row
// B) every 8th row, beginning with 4th row
// C) every 4th row, beginning with 2nd row
// D) every 2nd row, beginning with 1st row
for (unsigned int frame = 0; frame < m_frames.size(); frame++)
{
Bytes& current = m_frames[frame].pixels;
if (makeInterlaced)
{
// non-interlaced => interlaced
// set flag
m_rawHeader[posInterlaced] |= mask;
// re-order lines
Bytes interlaced;
interlaced.reserve(current.size());
for (unsigned int y = 0; y < m_height; y += 8)
interlaced.insert(interlaced.end(), current.begin() + y * m_width, current.begin() + y * m_width + m_width);
for (unsigned int y = 4; y < m_height; y += 8)
interlaced.insert(interlaced.end(), current.begin() + y * m_width, current.begin() + y * m_width + m_width);
for (unsigned int y = 2; y < m_height; y += 4)
interlaced.insert(interlaced.end(), current.begin() + y * m_width, current.begin() + y * m_width + m_width);
for (unsigned int y = 1; y < m_height; y += 2)
interlaced.insert(interlaced.end(), current.begin() + y * m_width, current.begin() + y * m_width + m_width);
current = interlaced;
}
else
{
// interlaced => non-interlaced
// unset flag
m_rawHeader[posInterlaced] &= ~mask;
// re-order lines
Bytes interlaced = current;
current.clear();
// offsets of the first line of each line
unsigned int pass0 = 0;
unsigned int pass1 = (m_height + 8 - 1) / 8;
// same as
//pass1 = pass0;
//for (unsigned int y = 0; y < m_height; y += 8)
// pass1++;
unsigned int pass2 = pass1 + (m_height + 8 - 5) / 8;
//pass2 = pass1;
//for (unsigned int y = 4; y < m_height; y += 8)
// pass2++;
unsigned int pass3 = pass2 + (m_height + 4 - 1) / 4;
//pass3 = pass2;
//for (unsigned int y = 2; y < m_height; y += 4)
//pass3++;
for (unsigned int y = 0; y < m_height; y++)
switch (y % 8)
{
case 0:
current.insert(current.end(), interlaced.begin() + pass0 * m_width, interlaced.begin() + (pass0 + 1) * m_width);
pass0++;
break;
case 4:
current.insert(current.end(), interlaced.begin() + pass1 * m_width, interlaced.begin() + (pass1 + 1) * m_width);
pass1++;
break;
case 2: case 6:
current.insert(current.end(), interlaced.begin() + pass2 * m_width, interlaced.begin() + (pass2 + 1) * m_width);
pass2++;
break;
case 1: case 3: case 5: case 7:
current.insert(current.end(), interlaced.begin() + pass3 * m_width, interlaced.begin() + (pass3 + 1) * m_width);
pass3++;
break;
}
}
}
}
/// read signature GIF 87a/89a
void GifImage::parseSignature()
{
const unsigned int NumBytes = 6;
m_version.resize(NumBytes);
for (unsigned int i = 0; i < NumBytes; i++)
m_version[i] = m_input.getByte();
// always starts with "GIF"
if (m_version[0] != 'G' ||
m_version[1] != 'I' ||
m_version[2] != 'F')
throw "invalid file signature";
// version is either "87a" or "89a"
if ( m_version[3] != '8' ||
(m_version[4] != '7' && m_version[4] != '9') ||
m_version[5] != 'a')
throw "invalid GIF version, only 87a and 89a supported";
}
/// global image parameters (constant for all frame)
void GifImage::parseGlobalDescriptor()
{
// get size (16 bits stored little endian)
m_width = getWord();
m_height = getWord();
// bits per color => 8 if 256 colors
m_colorDepth = m_input.getBits(3) + 1;
m_sizeGlobalColorMap = 1 << m_colorDepth;
// unused: true if colors are sorted descendingly (by "importance")
m_isSorted = m_input.getBool();
m_input.removeBits(3); // skip 3 bite
// has palette ?
bool hasGlobalColorMap = m_input.getBool();
if (!hasGlobalColorMap)
m_sizeGlobalColorMap = 0;
m_backgroundColor = m_input.getByte();
m_aspectRatio = m_input.getByte();
// read global palette
m_globalColorMap.resize(m_sizeGlobalColorMap);
for (unsigned int i = 0; i < m_sizeGlobalColorMap; i++)
{
Color color;
color.red = m_input.getByte();
color.green = m_input.getByte();
color.blue = m_input.getByte();
m_globalColorMap[i] = color;
}
}
/// GIF extensions (e.g. animation settings)
void GifImage::parseExtensions(Frame& frame)
{
while (true)
{
// each extension starts with 0x21
unsigned char marker = m_input.peekBits(8);
if (marker != 0x21)
return;
m_input.removeBits(8);
// get extension type
ExtensionType identifier = (ExtensionType)m_input.getBits(8);
if (identifier == GraphicControl)
{
//std::cout << "HACK: assume animated" << std::endl;
m_isAnimated = true;
}
// read all its parts (usually just one part)
Bytes data;
while (true)
{
unsigned char length = m_input.getByte();
// last part ?
if (length == 0)
break;
// skip contents
for (unsigned char i = 0; i < length; i++)
data.push_back(m_input.getByte());
};
frame.extensions.push_back(std::make_pair(identifier, data));
};
}
/// local image parameters
void GifImage::parseLocalDescriptor(Frame& frame)
{
unsigned char identifier = m_input.getByte();
if (identifier != 0x2C)
throw "expected local descriptor, but not found";
// frame dimensions
frame.offsetLeft = getWord();
frame.offsetTop = getWord();
frame.width = getWord();
frame.height = getWord();
// HACK: doesn't work for animations
frame.posInterlaced = m_input.getNumBytesRead();
// color map related stuff
unsigned int sizeLocalColorMap = 1 << (m_input.getBits(3) + 1);
m_input.removeBits(2);
frame.isSorted = m_input.getBool();
frame.isInterlaced = m_input.getBool();
bool hasLocalColorMap = m_input.getBool();
if (!hasLocalColorMap)
sizeLocalColorMap = 0;
#ifdef ALLOW_VERBOSE
if (verbose)
{
std::cout << frame.width << "x" << frame.height << " located at " << frame.offsetLeft << "x" << frame.offsetTop;
if (frame.isInterlaced)
std::cout << ", interlaced";
if (hasLocalColorMap)
std::cout << ", local color map size=" << sizeLocalColorMap;
}
#endif
// copy RGB colors
frame.localColorMap.resize(sizeLocalColorMap);
for (unsigned int i = 0; i < sizeLocalColorMap; i++)
{
Color color;
color.red = m_input.getByte();
color.green = m_input.getByte();
color.blue = m_input.getByte();
frame.localColorMap[i] = color;
}
}
/// final bytes of the image
void GifImage::parseTerminator()
{
unsigned char identifier = m_input.getByte();
if (identifier != 0x3B)
throw "invalid terminator";
}
/// read 16 bits, little endian
unsigned short GifImage::getWord()
{
unsigned short low = m_input.getByte();
unsigned short high = m_input.getByte();
return low + (high << 8);
}
show
Compress.h
// //////////////////////////////////////////////////////////
// Compress.h
// Copyright (c) 2018 Stephan Brumme. All rights reserved.
// see https://create.stephan-brumme.com/flexigif-lossless-gif-lzw-optimization/
#pragma once
#include "BinaryInputBuffer.h"
#include <vector>
#include <string>
/// decode a .Z compressed file (created by the old compress unix tool)
/** errors throw an exception (const char*) **/
class Compress
{
public:
// -------------------- data types --------------------
/// a continuous block of bytes
typedef std::vector<unsigned char> Bytes;
// -------------------- methods --------------------
/// load file
explicit Compress(const std::string& filename, bool loadAsUncompressedIfWrongMagicBytes = false);
/// replace LZW data with optimized data and write to disk
unsigned int writeOptimized(const std::string& filename, const std::vector<bool>& bits);
/// get uncompressed contents
const Bytes& getData() const;
/// for debugging only: save uncompressed data
bool dump(const std::string& filename) const;
/// show debug output
static bool verbose;
private:
/// first two bytes of every .Z file
enum
{
MagicByte1 = 0x1F,
MagicByte2 = 0x9D
};
/// settings of the original file (third byte of that file)
unsigned char m_settings;
/// simple wrapper to read file bit-wise
BinaryInputBuffer m_input;
/// uncompressed bytes
Bytes m_data;
};
show
Compress.cpp
A simple helper class to read files bitwise:
// //////////////////////////////////////////////////////////
// Compress.cpp
// Copyright (c) 2018 Stephan Brumme. All rights reserved.
// see https://create.stephan-brumme.com/smallzwflexigif-lossless-gif-lzw-optimization/
// I couldn't find a proper spec of the .Z file format
// my code is loosely based on GZIP's LZW implementation
// https://research.cs.wisc.edu/wpis/examples/pcca/gzip/gzip.c
// other resources:
// https://wiki.wxwidgets.org/Development:_Z_File_Format
// https://www.ioccc.org/2015/mills2/hint.html
#include "Compress.h"
#include "LzwDecoder.h"
#include <fstream>
#define ALLOW_VERBOSE
#ifdef ALLOW_VERBOSE
#include <iostream>
#include <iomanip>
#endif
// statics
bool Compress::verbose = false;
/// load file
Compress::Compress(const std::string& filename, bool loadAsUncompressedIfWrongMagicBytes)
: m_settings(0),
m_input(filename),
m_data()
{
try
{
if (m_input.empty())
throw "file not found or empty";
// check magic bytes
bool isZ = (m_input.getByte() == MagicByte1);
isZ |= (m_input.getByte() == MagicByte2);
// has proper magic bytes ?
if (isZ)
{
// compression settings
m_settings = m_input.getByte();
// default format is "block mode", where the highest bit is set
if ((m_settings & 0x80) == 0)
throw "only .Z block mode supported";
// unused bits, must be zero
if ((m_settings & 0x60) != 0)
throw "unknown .Z format flag found";
// maximum bits per LZW code, almost always 16
unsigned char maxBits = m_settings & 0x1F;
// get filesize
std::fstream in(filename.c_str(), std::ios::in | std::ios::binary);
in.seekg(0, in.end);
unsigned int filesize = (unsigned int)in.tellg();
unsigned int expected = 3 * filesize; // crude heuristic for size of uncompressed data
in.close();
// and decompress !
LzwDecoder::verbose = Compress::verbose;
LzwDecoder lzw(m_input, false, 8, maxBits, expected);
m_data = lzw.getBytes();
}
else
{
// should it have a .Z file ?
if (!loadAsUncompressedIfWrongMagicBytes)
throw "file is not a .Z compressed file (magic bytes don't match)";
// just read from disk
std::fstream in(filename.c_str(), std::ios::in | std::ios::binary);
// get filesize
in.seekg(0, in.end);
size_t numBytes = (size_t)in.tellg();
// allocate memory and read everything at once
m_data.resize(numBytes);
in.seekg(0, in.beg);
in.read((char*)&m_data[0], numBytes);
}
}
catch (const char* e)
{
#ifdef ALLOW_VERBOSE
std::cout << "ERROR: " << (e ? e : "(unknown exception)") << std::endl;
#endif
}
}
/// replace LZW data with optimized data and write to disk
unsigned int Compress::writeOptimized(const std::string& filename, const std::vector<bool>& bits)
{
std::ofstream file(filename.c_str(), std::ios::out | std::ios::binary);
// write magic bytes
file.put((char)MagicByte1);
file.put((char)MagicByte2);
// and settings
file.put((char)m_settings);
unsigned int pos = 0;
// merge single bits, write to disk
while (pos < bits.size())
{
// create one byte
unsigned char oneByte = 0;
for (unsigned char bit = 0; bit < 8; bit++, pos++)
if (pos < bits.size() && bits[pos])
oneByte |= 1 << bit;
// and write to disk
file << oneByte;
}
// and we're done
unsigned int filesize = (unsigned int)file.tellp();
file.close();
return filesize;
}
/// get uncompressed contents
const Compress::Bytes& Compress::getData() const
{
return m_data;
}
/// for debugging only: save uncompressed data
bool Compress::dump(const std::string& filename) const
{
std::ofstream file(filename.c_str(), std::ios::out | std::ios::binary);
file.write((char*)&m_data[0], m_data.size());
return file.good();
}
show
BinaryInputBuffer.h
// //////////////////////////////////////////////////////////
// BinaryInputBuffer.h
// Copyright (c) 2011-2018 Stephan Brumme. All rights reserved.
// see https://create.stephan-brumme.com/disclaimer.html
//
#pragma once
#pragma warning (disable: 4530) // missing exception handling
#include <string>
#include <fstream>
/// read a file bit-wise
class BinaryInputBuffer
{
public:
/// read from file
explicit BinaryInputBuffer(const std::string& filename);
/// get number of bytes read so far
unsigned int getNumBytesRead() const;
/// get number of bits still available
unsigned int getNumBitsLeft() const;
/// true, if no bits left
bool empty() const;
/// take a look at the next bits without advancing the file pointer
unsigned int peekBits (unsigned char numBits); // at most 16 bits
/// increment buffer pointers/offsets
void removeBits(unsigned char numBits); // at most 16 bits
/// get the next bits and increment buffer pointers/offsets
unsigned int getBits (unsigned char numBits); // at most 16 bits
/// get 8 bits
unsigned char getByte();
/// get a single bit
bool getBool();
private:
/// simple I/O buffering, faster than C lib's built-in functions
unsigned char getBufferedByte();
/// bit/byte file stream
std::ifstream m_stream;
/// total number of bytes read so far
unsigned int m_read;
/// total bits left (initially, it's 8*m_size)
unsigned int m_bitsLeft;
/// store bits until next byte boundary
unsigned int m_bitBuffer;
/// number of valid bits in _bitBuffer
unsigned char m_bitBufferSize;
/// default buffer size
enum { CacheSize = 1024 };
/// buffer
unsigned char m_cache[CacheSize];
/// position of next byte
unsigned int m_cacheOffset;
/// position beyond last valid byte
unsigned int m_cacheSize;
};
show
BinaryInputBuffer.cpp
And finally the main program which parses command-line parameters:
// //////////////////////////////////////////////////////////
// BinaryInputBuffer.cpp
// Copyright (c) 2011-2018 Stephan Brumme. All rights reserved.
// see https://create.stephan-brumme.com/disclaimer.html
//
#include "BinaryInputBuffer.h"
#include <cassert>
/// read from file
BinaryInputBuffer::BinaryInputBuffer(const std::string& filename)
: m_stream(),
m_read(0),
m_bitsLeft(0),
m_bitBuffer(0),
m_bitBufferSize(0),
m_cache(),
m_cacheOffset(0),
m_cacheSize(0)
{
// open file
m_stream.open(filename.c_str(), std::ios::in | std::ios::binary);
if (!m_stream)
return;
// get file size
m_stream.seekg(0, std::ios_base::end);
unsigned int size = (unsigned int)m_stream.tellg();
m_stream.seekg(0, std::ios_base::beg);
// load first segment (or even the complete file)
m_cacheSize = CacheSize;
if (m_cacheSize > size)
m_cacheSize = size;
m_stream.read((char*)m_cache, m_cacheSize);
// compute total number of bits
m_bitsLeft = 8 * size;
}
/// get number of bytes read so far
unsigned int BinaryInputBuffer::getNumBytesRead() const
{
return m_read;
}
/// get number of bits still available
unsigned int BinaryInputBuffer::getNumBitsLeft() const
{
return m_bitsLeft;
}
/// true, if no bits left
bool BinaryInputBuffer::empty() const
{
return m_bitsLeft == 0;
}
/// take a look at the next bits without modifying the buffer
unsigned int BinaryInputBuffer::peekBits(unsigned char numBits)
{
assert(numBits <= 16);
assert(numBits <= m_bitsLeft);
// move 8 bits from stream to buffer
while (m_bitBufferSize < numBits)
{
// get one byte from stream and add to internal buffer
unsigned int byte = getBufferedByte();
m_bitBuffer |= byte << m_bitBufferSize;
m_bitBufferSize += 8;
}
// return desired bits
unsigned int bitMask = (1 << numBits) - 1;
return m_bitBuffer & bitMask;
}
/// get the next bits and increment buffer pointers/offsets
unsigned int BinaryInputBuffer::getBits(unsigned char numBits)
{
unsigned int result = peekBits(numBits);
removeBits(numBits);
return result;
}
/// get 8 bits
unsigned char BinaryInputBuffer::getByte()
{
return (unsigned char)getBits(8);
}
/// get a single bit
bool BinaryInputBuffer::getBool()
{
return getBits(1) == 1;
}
/// increment buffer pointers/offsets
void BinaryInputBuffer::removeBits(unsigned char numBits)
{
// if more bits needs to be removed than are actually available in the buffer
if (m_bitBufferSize < numBits)
peekBits(numBits);
// adjust buffers and counters
m_bitBuffer >>= numBits;
m_bitBufferSize -= numBits;
m_bitsLeft -= numBits;
}
/// simple I/O buffering, faster than C lib's built-in functions
unsigned char BinaryInputBuffer::getBufferedByte()
{
// (re-)fill buffer
if (m_cacheOffset >= m_cacheSize)
{
// read as much as possible (it has to fit into the buffer, though)
size_t numRead = (m_bitsLeft + 7) / 8;
if (numRead > m_cacheSize)
numRead = m_cacheSize;
m_stream.read((char*)m_cache, numRead);
m_cacheOffset = 0;
}
// count number of read bytes
m_read++;
// single byte
return m_cache[m_cacheOffset++];
}
show
flexiGIF.cpp
// //////////////////////////////////////////////////////////
// flexigif.cpp
// Copyright (c) 2018 Stephan Brumme. All rights reserved.
// see https://create.stephan-brumme.com/flexigif-lossless-gif-lzw-optimization/
#include "GifImage.h"
#include "Compress.h"
#include "LzwDecoder.h"
#include "LzwEncoder.h"
#include <vector>
#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <ctime>
namespace
{
// ----- constants -----
/// program version
const char* Version = "2018.11a";
/// return codes
enum ReturnCode
{
NoError = 0,
GenericException = 1,
NotImplemented = 2,
ParameterOutOfRange = 3,
InvalidParameter = 4,
MissingParameter = 5,
UnknownParameter = 6,
ContradictingParameters = 7,
MoreThanTwoFilenames = 8,
SameFile = 9,
DontOverwrite = 10,
NoFrameFound = 11,
OnlyForGifs = 12,
DebuggingMode = 99
};
/// default values of the optimizer
enum DefaultValues
{
GifMaxToken = 20000,
LzwMaxToken = 100000,
GifMaxDictionary = 4096,
LzwMaxDictionary = 65536,
GifMaxDictionaryCompatible = GifMaxDictionary - 3, // 4093
Alignment = 1,
MinImprovement = 1,
MinNonGreedy = 2
};
}
/// terminate program, show help
void help(const std::string& errorMsg = "", int errorCode = NoError, bool showHelp = true)
{
// error
if (!errorMsg.empty())
std::cout << "ERROR: " << errorMsg << std::endl;
// default help
if (showHelp)
std::cout << std::endl
<< "flexiGIF " << Version << ", written by Stephan Brumme" << std::endl
<< "Usage: flexigif [options] INPUTFILE OUTPUTFILE" << std::endl
<< std::endl
<< "Options:" << std::endl
<< " -p --prettygood try greedy search plus non-greedy with '-a=" << Alignment << " -n=" << MinNonGreedy << " -m=" << MinImprovement << " -d=" << GifMaxDictionary << "' => typically the best results" << std::endl
<< " -a=x --alignment=x blocks starts at multiples of x (default is -a=1 => best compression but may be slow)" << std::endl
<< " -d=x --dictionary=x maximum size of the LZW dictionary (default is -d=" << GifMaxDictionary << ", 0 means \"maximum\")" << std::endl
<< " -t=x --maxtokens=x maximum number of tokens per block (default is -t=" << GifMaxToken << ")" << std::endl
<< " -c --compatible create files that should be more compatible to faulty decoders" << std::endl
<< " -l --deinterlace ensure that output is not interlaced" << std::endl
<< " -g --greedy enable greedy match search (default)" << std::endl
<< " -n=x --nongreedy=x enable non-greedy match search, x is the minimum match length (default is -n=" << MinNonGreedy << ")" << std::endl
<< " -m=x --minimprovement=x minimum number of bytes saved by a non-greedy match (requires parameter -n, default is -m=" << MinImprovement << ")" << std::endl
<< " -i --info analyze internal structure of INPUTFILE" << std::endl
<< " -f --force overwrite OUTPUTFILE if it already exists" << std::endl
<< " -r --splitruns allow partial matching of long runs of the same byte (requires parameter -n)" << std::endl
<< " -u=x --userdefined=x don't search but set custom block boundaries, x is an ascendingly sorted list, e.g. -u=500,2000,9000" << std::endl
<< " -s --summary when finished, compare filesize of INPUTFILE and OUTPUTFILE" << std::endl
<< " -v --verbose show debug messages" << std::endl
<< " -q --quiet no output during compression" << std::endl
<< " -Z INPUTFILE and OUTPUTFILE are stored in .Z file format instead of .gif" << std::endl
<< " -b=x --benchmark=x benchmark GIF decoder, x stands for the number of iterations (default: x=100)" << std::endl
<< " -y --immediately avoid initial clear code and start immediately with compressed data" << std::endl
//<< " --ppm=x store x-th frame in PPM format in OUTPUTFILE" << std::endl
//<< " --indices=x store x-th frame's indices in OUTPUTFILE" << std::endl
//<< " --compress INPUTFILE isn't compressed yet and OUTPUTFILE will be a .Z file" << std::endl
//<< " --decompress store decompressed contents of INPUTFILE, which must be a .Z file, in OUTPUTFILE" << std::endl
<< " -h --help display this help screen" << std::endl
<< std::endl
<< "See https://create.stephan-brumme.com/flexigif-lossless-gif-lzw-optimization/ for more infos" << std::endl;
exit(errorCode);
}
/// let's go !
int main(int argc, char** argv)
{
// no parameters at all ?
if (argc == 1)
help();
// filenames
std::string input;
std::string output;
// other parameters
bool isGif = true; // if false then .Z file format
bool inputInfo = false; // just one filename: show some compression details of input file
bool showSummary = false; // after finishing recompression: display how many bytes were saved
bool overwrite = false; // overwrite an existing OUTPUTFILE
bool quiet = false; // no console output
bool verbose = false; // lots of console output
bool deinterlace = false; // deinterlace a GIF image
bool smartGreedy = false;
bool benchmark = false; // decompress INPUTFILE several times and measure throughput in MB/sec
unsigned int iterations = 10; // benchmark only: repeat x times
bool showDecompressed = false; // dump a frame in PPM format to OUTPUTFILE
bool showIndices = false; // dump a frame's indices to OUTPUTFILE
unsigned int ppmFrame = 0; // only relevant if showDecompressed is true
bool compressZ = false; // decompress INPUTFILE to OUTPUTFILE (.Z format only)
bool decompressZ = false; // compress INPUTFILE to OUTPUTFILE (.Z format)
std::vector<unsigned int> predefinedBlocks; // insert clear codes at these user-defined positions
// optimizer
LzwEncoder::OptimizationSettings optimize;
optimize.alignment = Alignment;
optimize.verbose = false;
optimize.greedy = true;
optimize.minImprovement = MinImprovement;
optimize.minNonGreedyMatch = MinNonGreedy;
optimize.splitRuns = false;
optimize.maxDictionary = 0;
optimize.maxTokens = GifMaxToken;
optimize.startWithClearCode = true;
optimize.readOnlyBest = false;
optimize.avoidNonGreedyAgain = false;
// parse parameters
std::string current;
size_t currentPos = 0; // if several short arguments are merged into one, e.g. "-v -s -f" => "-vsf"
// arguments 1 ... argc-1
int parameter = 0;
while (parameter < argc)
{
// long parameters cannot be merged (such as --force)
if ((current.size() > 2 && current[0] == '-' && current[1] == '-') ||
(currentPos + 1 >= current.size()))
current.clear();
// next parameter ?
if (current.empty() || currentPos + 1 >= current.size())
{
parameter++;
if (parameter == argc)
break;
}
// read argument
if (current.empty())
{
current = argv[parameter];
if (current.size() >= 2 && current[0] == '-' && current[1] != '-')
currentPos = 1;
else
currentPos = 0;
}
else
{
// several short parameters were merged
currentPos++;
}
// if single-letter argument
char currentShort = current[currentPos];
// no option => must be a filename
if (current[0] != '-')
{
// first comes the input file
if (input.empty())
{
input = current;
current.clear();
continue;
}
// then the output file
if (output.empty())
{
output = current;
current.clear();
continue;
}
// but at most two filenames ...
help("more than two filenames specified", MoreThanTwoFilenames);
}
// help
if (currentShort == 'h' || current == "--help")
help();
// info
if (currentShort == 'i' || current == "--info")
{
inputInfo = true;
continue;
}
// summary
if (currentShort == 's' || current == "--summary")
{
if (quiet)
help("flag -s (show summary) contradicts -q (quiet)", ContradictingParameters, false);
showSummary = true;
continue;
}
// overwrite
if (currentShort == 'f' || current == "--force")
{
overwrite = true;
continue;
}
// verbose
if (currentShort == 'v' || current == "--verbose")
{
if (quiet)
help("flag -v (verbose) contradicts -q (quiet)", ContradictingParameters, false);
verbose = true;
optimize.verbose = true;
GifImage::verbose = true;
Compress::verbose = true;
LzwDecoder::verbose = true;
continue;
}
// quiet
if (currentShort == 'q' || current == "--quiet")
{
if (verbose)
help("flag -q (quiet) contradicts -v (verbose)", ContradictingParameters, false);
if (showSummary)
help("flag -q (quiet) contradicts -s (show summary)", ContradictingParameters, false);
quiet = true;
continue;
}
// greedy
if (currentShort == 'g' || current == "--greedy")
{
optimize.greedy = true;
continue;
}
// split runs
if (currentShort == 'r' || current == "--splitruns")
{
optimize.splitRuns = true;
continue;
}
// deinterlace
if (currentShort == 'l' || current == "--deinterlace")
{
deinterlace = true;
continue;
}
// good settings
if (currentShort == 'p' || current == "--prettygood")
{
smartGreedy = true;
optimize.greedy = false;
optimize.minImprovement = MinImprovement;
optimize.maxDictionary = GifMaxDictionary;
optimize.maxTokens = GifMaxToken;
optimize.avoidNonGreedyAgain = true;
continue;
}
// enhance compatibility
if (currentShort == 'c' || current == "--compatible")
{
optimize.maxDictionary = GifMaxDictionaryCompatible; // 4093
optimize.greedy = true;
optimize.startWithClearCode = true;
continue;
}
// avoid initial clear code (relevant for GIFs only)
if (currentShort == 'y' || current == "--immediately")
{
optimize.startWithClearCode = false;
continue;
}
// compress' .Z file format
if (currentShort == 'Z')
{
isGif = false;
continue;
}
// decompress .Z file, nothing else
if (current == "--decompress")
{
decompressZ = true;
isGif = false; // implicit -Z flag
continue;
}
// INPUTFILE isn't compressed (applies to .Z files only)
if (current == "--compress")
{
compressZ = true;
isGif = false; // implicit -Z flag
continue;
}
// adjustable parameters
bool hasValue = false;
int value = 0;
std::string strValue;
size_t splitAt = current.find('=');
if (splitAt != std::string::npos)
{
// get right-hand side and store number/string in variables value/strValue
strValue = current.substr(splitAt + 1);
value = atoi(strValue.c_str());
// keep only left-hand side of parameter
current.resize(splitAt);
// no other argument may be merged with this one
currentPos = current.size();
hasValue = !strValue.empty();
}
// alignment/blockstart
if (current == "-a" || current == "--alignment")
{
if (value <= 0)
help("parameter -a/--alignment cannot be zero", ParameterOutOfRange, false);
optimize.alignment = (unsigned int)value;
continue;
}
// maximum size of dictionary
if (current == "-d" || current == "--dictionary")
{
if (value <= 0)
help("parameter -d/--dictionary cannot be zero", ParameterOutOfRange, false);
optimize.maxDictionary = (unsigned int)value;
continue;
}
// maximum number of tokens per block
if (current == "-t" || current == "--maxtokens")
{
if (value < 0)
value = 0; // no limit
optimize.maxTokens = (unsigned int)value;
continue;
}
// non-greedy minimum improvement
if (current == "-m" || current == "--minimprovement")
{
if (value <= 0)
help("parameter -m/--minimprovement cannot be zero", ParameterOutOfRange, false);
optimize.minImprovement = (unsigned int)value;
continue;
}
// non-greedy match length
if (currentShort == 'n' || current == "--nongreedy")
{
optimize.greedy = false;
optimize.minNonGreedyMatch = hasValue ? (unsigned int)value : MinNonGreedy;
if (value < 2)
help("parameter -n/--nongreedy cannot be less than 2", ParameterOutOfRange, false);
continue;
}
// predefined block boundaries
if (current == "-u" || current == "--userdefined")
{
// syntax: a,b,c where a/b/c are decimal numbers
predefinedBlocks.push_back(0);
for (size_t i = 0; i < strValue.size(); i++)
{
char oneByte = strValue[i];
// next number follows ?
if (oneByte == ',')
{
predefinedBlocks.push_back(0);
continue;
}
if (oneByte < '0' || oneByte > '9')
help("invalid syntax for parameter -u/--userdefined: it must be a sorted list of numbers", InvalidParameter, false);
// process digit
predefinedBlocks.back() *= 10;
predefinedBlocks.back() += oneByte - '0';
}
// check whether the list is in ascending order (duplicates are disallowed, too)
for (size_t i = 1; i < predefinedBlocks.size(); i++)
if (predefinedBlocks[i - 1] >= predefinedBlocks[i])
help("invalid syntax for parameter -u/--userdefined: it must be a sorted list of numbers", InvalidParameter, false);
continue;
}
// benchmark, user-defined number of iterations
if (current == "-b" || current == "--benchmark")
{
benchmark = true;
iterations = hasValue ? value : 100; // default: decode 100x
if (value < 1)
help("parameter -b/--benchmark cannot be zero", ParameterOutOfRange, false);
continue;
}
// PPM output of a GIF frame of LZW decompression of Z file (TODO: jsut debugging code, not in public interface yet)
if (current == "--ppm")
{
showDecompressed = true;
ppmFrame = hasValue ? value : 1; // first frame by default
continue;
}
// PPM output of a GIF frame of LZW decompression of Z file (TODO: jsut debugging code, not in public interface yet)
if (current == "--indices")
{
showIndices = true;
ppmFrame = hasValue ? value : 1; // first frame by default
continue;
}
// whoopsie ...
help("unknown parameter " + current, 1);
}
try
{
// auto-detect .Z files
if (input.size() > 2 &&
input[input.size() - 2] == '.' &&
input[input.size() - 1] == 'Z')
isGif = false;
if (!isGif)
{
// increase token limit
if (optimize.maxTokens == GifMaxToken)
optimize.maxTokens = LzwMaxToken;
}
// check parameter combinations
if (optimize.splitRuns && optimize.greedy)
help("parameter -r requires -n", MissingParameter);
// only one parameter: a file name => automatically switch to "info mode"
if (argc == 2 && !input.empty())
inputInfo = true;
// show GIF/LZW infos about input file
if (inputInfo)
{
if ( input .empty())
help("no filename provided", MissingParameter, false);
if (!output.empty())
help("too many filenames provided (accepting only one)", MoreThanTwoFilenames, false);
// just load and parse
if (isGif)
{
GifImage::verbose = true;
GifImage info(input);
}
else
{
Compress::verbose = true;
Compress info(input);
}
return NoError;
}
// benchmark
if (benchmark)
{
if (input.empty())
help("missing INPUTFILE", MissingParameter, false);
std::cout << "benchmarking '" << input << "' ..." << std::endl
<< "decoding file, " << iterations << " iterations" << std::endl;
clock_t start = clock();
unsigned int numDecodedFrames = 0;
unsigned long long numPixels = 0;
for (unsigned int i = 0; i < iterations; i++)
{
if (isGif)
{
// parse file
GifImage gif(input);
// error during decoding
if (gif.getNumFrames() == 0)
help("no frames found in " + input, NoFrameFound, false);
// statistics
numDecodedFrames += gif.getNumFrames();
for (unsigned int frame = 0; frame < gif.getNumFrames(); frame++)
numPixels += gif.getFrame(frame).pixels.size();
// disable verbose output for the 2..n iteration
GifImage::verbose = false;
}
else
{
// parse file
Compress lzw(input, compressZ);
numDecodedFrames++;
numPixels += lzw.getData().size();
// disable verbose output for the 2..n iteration
Compress::verbose = false;
}
}
clock_t finish = clock();
float seconds = (finish - start) / float(CLOCKS_PER_SEC);
float perFile = seconds / iterations;
float perFrame = seconds / numDecodedFrames;
float throughput = numPixels / seconds;
std::cout << std::fixed
<< "elapsed: " << std::setw(8) << std::setprecision(6) << seconds << " seconds" << std::endl
<< "per file: " << std::setw(8) << std::setprecision(6) << perFile << " seconds" << std::endl;
if (iterations != numDecodedFrames)
std::cout << "per frame: " << std::setw(8) << std::setprecision(6) << perFrame << " seconds" << std::endl;
std::cout << "throughput: " << std::setw(8) << std::setprecision(3) << throughput / 1000000 << " megapixel/second" << std::endl;
return NoError;
}
if (input .empty())
help("missing INPUTFILE", MissingParameter);
if (output.empty())
help("missing OUTPUTFILE", MissingParameter);
// same name ?
if (input == output)
help("INPUTFILE and OUTPUTFILE cannot be the same filename", SameFile, false);
// don't overwrite by default
if (!overwrite)
{
std::fstream checkOutput(output.c_str());
if (checkOutput)
help("OUTPUTFILE already exists, please use -f to overwrite an existing file", DontOverwrite, false);
}
// store a single frame in PPM format
if (showDecompressed || showIndices)
{
GifImage gif(input);
// use 0-index internally instead of 1-index
ppmFrame--;
if (ppmFrame >= gif.getNumFrames()) // note: if ppmFrame was 0 => -1 => 0xFFFFFF... => a very large number
help("please specify a valid frame number", ParameterOutOfRange);
// write PPM (or plain indices)
bool ok = false;
if (showDecompressed)
ok = gif.dumpPpm(output, ppmFrame);
else
ok = gif.dumpIndices(output, ppmFrame);
return ok ? NoError : DontOverwrite;
}
// decompress .Z file
if (decompressZ)
{
Compress lzw(input, compressZ);
if (lzw.dump(output))
return NoError;
else
return DontOverwrite;
}
// -------------------- process input --------------------
clock_t start = clock();
if (!quiet)
std::cout << "flexiGIF " << Version << ", written by Stephan Brumme" << std::endl;
if (verbose)
{
std::cout << "used options:";
std::cout << " -a=" << optimize.alignment;
if (optimize.startWithClearCode)
std::cout << " -c";
if (optimize.maxDictionary > 0)
std::cout << " -d=" << optimize.maxDictionary;
if (overwrite)
std::cout << " -f";
if (deinterlace)
std::cout << " -l";
if (!optimize.greedy)
std::cout << " -m=" << optimize.minImprovement;
if (!optimize.greedy)
std::cout << " -n=" << optimize.minNonGreedyMatch;
if (smartGreedy)
std::cout << " -p";
if (quiet)
std::cout << " -q";
if (optimize.splitRuns && !optimize.greedy)
std::cout << " -r";
if (showSummary)
std::cout << " -s";
std::cout << " -t=" << optimize.maxTokens;
if (verbose)
std::cout << " -v";
if (!isGif)
std::cout << " -Z";
if (compressZ)
std::cout << " --compress";
if (decompressZ)
std::cout << " --decompress";
if (showIndices)
std::cout << " --indices=" << ppmFrame;
if (showDecompressed)
std::cout << " --ppm=" << ppmFrame;
std::cout << std::endl;
}
if (verbose)
std::cout << std::endl << "===== decompress '" << input << "' =====" << std::endl;
if (isGif) // ---------- recompress GIF frames ----------
{
// load GIF
GifImage gif(input);
// error during decoding ?
if (gif.getNumFrames() == 0)
help("no frames found in " + input, NoFrameFound, false);
// determine minCodeSize, often 8 (up to 256 colors)
optimize.minCodeSize = 1;
// look for largest byte in each frame
unsigned char maxValue = 0;
const unsigned char EightBits = 0x80;
for (unsigned int frame = 0; frame < gif.getNumFrames() && maxValue < EightBits; frame++)
{
// get bytes / pixels
const GifImage::Bytes& indices = gif.getFrame(frame).pixels;
for (size_t i = 0; i < indices.size(); i++)
// larger ?
if (maxValue < indices[i])
{
maxValue = indices[i];
// at least 8 bits ? => maximum
if (maxValue >= EightBits)
break;
}
}
// compute number of bits
while (maxValue >= (1 << optimize.minCodeSize))
optimize.minCodeSize++;
// codeSize = 1 is not allowed by spec, even b/w images have codeSize = 2
if (optimize.minCodeSize == 1)
optimize.minCodeSize = 2;
// de-interlace non-animated GIFs
if (deinterlace)
{
if (gif.getNumFrames() > 1)
help("de-interlacing is not supported yet for animated GIFs", NotImplemented, false);
gif.setInterlacing(false);
}
if (gif.getNumFrames() > 1 && !predefinedBlocks.empty())
help("user-defined block boundaries are not allowed for animated GIFs", NotImplemented, false);
// -------------------- generate output --------------------
if (verbose)
std::cout << std::endl << "===== compression in progress ... =====" << std::endl;
// optimize all frames
unsigned int numFrames = gif.getNumFrames();
std::vector<std::vector<bool> > optimizedFrames;
for (unsigned int frame = 0; frame < numFrames; frame++)
{
// get original LZW bytes
const GifImage::Frame& current = gif.getFrame(frame);
const std::vector<unsigned char>& indices = current.pixels;
LzwEncoder encoded(indices, isGif);
optimize.minCodeSize = current.codeSize;
// store optimized LZW bytes
LzwEncoder::BitStream optimized;
// look for optimal block boundaries
if (predefinedBlocks.empty())
{
unsigned int lastDisplay = 0;
int pos = (int)indices.size() - 1;
for (int i = pos; i >= 0; i--)
{
// only if block start is aligned
if (i % optimize.alignment != 0)
continue;
// show progress
if (!quiet && (i / optimize.alignment) % 8 == 0 && clock() != lastDisplay)
{
unsigned int percentage = 100 - (100 * i / indices.size());
std::cout << " \rframe " << frame+1 << "/" << numFrames << " (" << indices.size() << " pixels): "
<< percentage << "% done";
// ETA
clock_t now = clock();
float elapsed = (now - start) / float(CLOCKS_PER_SEC);
float estimated = elapsed * 100 / (percentage + 0.000001f) - elapsed;
if (elapsed > 3 && numFrames == 1 && estimated >= 1)
std::cout << " (after " << (int)elapsed << "s, about " << (int)estimated << "s left)";
std::cout << std::flush;
lastDisplay = now;
}
// estimate cost
encoded.optimizePartial(i, 0, false, true, optimize);
// repeat estimation, this time with non-greedy search
if (smartGreedy && !optimize.greedy)
{
optimize.greedy = true;
encoded.optimizePartial(i, 0, false, true, optimize);
optimize.greedy = false;
}
}
if (!quiet)
std::cout << " " << std::endl;
// final bitstream for current image
optimized = encoded.optimize(optimize);
}
else
{
// remove invalid block boundaries (or should it be an ERROR ?)
while (predefinedBlocks.back() > indices.size())
predefinedBlocks.pop_back();
// to simplify code, include start and end of file as boundaries, too
if (predefinedBlocks.empty() || predefinedBlocks.front() != 0)
predefinedBlocks.insert(predefinedBlocks.begin(), 0);
if (predefinedBlocks.back() != indices.size())
predefinedBlocks.push_back(indices.size());
// avoid certain optimizer settings that might cause incomplete images
optimize.maxTokens = 0;
optimize.maxDictionary = 0;
optimized = encoded.merge(predefinedBlocks, optimize);
}
optimizedFrames.push_back(optimized);
}
// write to disk
gif.writeOptimized(output, optimizedFrames, optimize.minCodeSize);
}
else // ---------- .Z file format ----------
{
if (!predefinedBlocks.empty())
help("predefined blocks not implemented yet for .Z files", NotImplemented);
// disable GIF-only optimizations
optimize.startWithClearCode = false;
Compress lzw(input, compressZ);
// get LZW bytes
const std::vector<unsigned char>& bytes = lzw.getData();
LzwEncoder encoded(bytes, isGif);
// adjust optimizer:
// always the full ASCII alphabet
optimize.minCodeSize = 8;
// dictionary limit is 2^16 instead of 2^12
if (optimize.maxDictionary == GifMaxDictionary || optimize.maxDictionary == GifMaxDictionaryCompatible)
optimize.maxDictionary = LzwMaxDictionary; // 0 is an option, too, ... it disables the limit check
if (verbose)
std::cout << std::endl << "===== compression in progress ... =====" << std::endl;
// store optimized LZW bytes
LzwEncoder::BitStream optimized;
// look for optimal block boundaries
unsigned int percentageDone = 0;
int pos = (int)bytes.size() - 1;
for (int i = pos; i >= 0; i--)
{
// only if block start is aligned
if (i % optimize.alignment != 0)
continue;
// show progress
float percentage = 100 - (100 * float(i) / bytes.size());
if (percentage != percentageDone && !quiet)
{
// ETA
clock_t now = clock();
float elapsed = (now - start) / float(CLOCKS_PER_SEC);
float estimated = elapsed * 100 / (percentage + 0.000001f) - elapsed;
std::cout << " \r" << (int)percentage << "% done";
if (elapsed > 3 && estimated >= 1)
std::cout << " (after " << (int)elapsed << "s, about " << (int)estimated << "s left)";
std::cout << std::flush;
percentageDone = (unsigned int)percentage;
}
// estimate cost
encoded.optimizePartial(i, 0, false, true, optimize);
}
if (!quiet)
std::cout << " " << std::endl;
// write to disk
optimized = encoded.optimize(optimize);
lzw.writeOptimized(output, optimized);
}
// -------------------- bonus output :-) --------------------
if (showSummary)
{
// measure duration
clock_t finish = clock();
float seconds = (finish - start) / float(CLOCKS_PER_SEC);
// get filesizes
std::fstream in (input .c_str(), std::ios::in | std::ios::binary);
std::fstream out(output.c_str(), std::ios::in | std::ios::binary);
in .seekg(0, in .end);
out.seekg(0, out.end);
int before = (int)in .tellg();
int now = (int)out.tellg();
if (verbose)
std::cout << std::endl << "===== done ! =====" << std::endl;
// smaller, larger ?
int diff = before - now;
if (diff == 0)
std::cout << "no optimization found for '" << input << "', same size as before (" << now << " bytes).";
if (diff > 0)
std::cout << "'" << output << "' is " << diff << " bytes smaller than '" << input << "' (" << now << " vs " << before << " bytes) => "
<< "you saved " << std::fixed << std::setprecision(3) << (diff * 100.0f / before) << "%.";
if (diff < 0)
{
std::cout << "'" << output << "' is " << -diff << " bytes larger than '" << input << "' (" << now << " vs " << before << " bytes).";
if (optimize.alignment > 1 || optimize.greedy)
std::cout << " Please use more aggressive optimization settings.";
}
std::cout << " Finished after " << std::fixed << std::setprecision(2) << seconds << " seconds." << std::endl;
}
}
catch (const char* e)
{
std::cerr << "ERROR: " << (e ? e : "(no message)") << std::endl;
return GenericException;
}
catch (std::exception& e)
{
std::cerr << "ERROR: " << (e.what() ? e.what() : "(no message)") << std::endl;
return GenericException;
}
catch (...)
{
std::cerr << "ERROR: unknown exception caught" << std::endl;
return GenericException;
}
return NoError;
}
Or download these precompiled binaries (static builds, need no libraries/DLLs):
Windows (x32 and x64):
flexiGIF.2018.11a.exe (latest release, November 19, 2018) / 270.0 kb
flexiGIF.2018.10b.exe (older version, October 17, 2018) / 268.5 kb
flexiGIF.2018.10a.exe (older version, October 15, 2018) / 269.0 kb
flexiGIF.2018.09b.exe (older version, September 26, 2018) / 259.0 kb
flexiGIF.2018.09a.exe (older version, September 16, 2018) / 252.5 kb
Linux (x64 only):
flexiGIF.2018.11a (latest release, November 19, 2018) / 1,271.0 kb
flexiGIF.2018.10b (older version, October 16, 2018) / 1,270.5 kb
flexiGIF.2018.10a (older version, October 15, 2018) / 1,270.5 kb
flexiGIF.2018.09b (older version, September 26, 2018) / 1,143.4 kb
flexiGIF.2018.09a (older version, September 17, 2018) / 1,354.1 kb
Changelog
- version 2018.11a
- November 19, 2018
- bugfix: final pass sometimes aborted without emitting any bits (regression from October release)
- version 2018.10b
- October 16, 2018
- bugfix: default dictionary limit for .Z files was 212 when it should have been 216
- version 2018.10a
- October 15, 2018
-p
automatically chooses between greedy and non-greedy mode (GIFs now about 0.2% smaller)- fixed bug in cost estimation which causes sub-optimal dictionary reset points
- added support for .Z format
- reduced memory consumption for
-a=x
if x > 1
- version 2018.09b
- September 26, 2018
- source code made available
- fixed bug with animated GIFs where the minimum bit size varied between frames
- version 2018.09a
- September 15, 2018
- initial version
Gallery
I downloaded the GIF images found on the Wikipedia page about the GIF file format.The program Gifsicle is almost always recommended as the best GIF optimizer. flexiGIF beats it consistently:
image | program | bytes |
---|---|---|
flexiGIF 2018.10a (parameters: -p ) |
55,799 bytes | |
(original) | 56,752 bytes (+953 bytes, +1.71%) |
image | program | bytes |
---|---|---|
flexiGIF 2018.10a (parameters: -p , (using Gifsicle's output)) |
339,011 bytes | |
gifsicle 1.91 | 347,919 bytes (+8,908 bytes, +2.63%) |
|
flexiGIF 2018.09a | 352,615 bytes (+13,604 bytes, +4.01%) |
|
(original) | 363,021 bytes (+24,010 bytes, +7.08%) |
image | program | bytes |
---|---|---|
flexiGIF 2018.10a (parameters: none) |
280 bytes | |
gifsicle 1.91 | 291 bytes (+11 bytes, +3.93%) |
|
(original) | 291 bytes (+11 bytes, +3.93%) |
image | program | bytes |
---|---|---|
flexiGIF 2018.10a (parameters: -p -m=1 ) |
52,663 bytes | |
gifsicle 1.91 | 54,268 bytes (+1,605 bytes, +3.05%) |
|
(original) | 54,310 bytes (+1,647 bytes, +3.13%) |
image | program | bytes |
---|---|---|
flexiGIF 2018.10a (parameters: -p -m=1 ) |
167,333 bytes | |
gifsicle 1.91 | 170,958 bytes (+3,625 bytes, +2.17%) |
|
(original / Irfanview 4.51) | 171,064 bytes (+3,731 bytes, +2.23%) |
.Z
file format
There is no official documentation of the .Z file format. I looked at the source code of
gzip,
ncompress and
Chris Mills' 2015 IOCCC program in order to figure out how it works.The latter is a decompressor written in just 256 byte of completely obfuscated C.
The .Z header is very simple: just the two magic bytes 0x1F and 0x9D followed by a "settings" byte:
- the highest bit should be set
- "block mode"
- that means, a dictionary reset is allowed
- my program supports only "block mode"
- the lowest 4 bits encode the highest number of bits per LZW code
- at most 16 (even though it's possible to encode up to 31 but that's not supported by any program
- the shortest LZW code has 9 bits
- all other bits should be zero
- the initial code size is always 9 bits - independent of input data (GIF: images with few colors may have a smaller code size)
- a dictionary may contain up to 226entries (GIF: 212)
- there is no dedicated end-of-file code, the first "free" LZW code is 257 (GIF: 258 for 8 bit images)
- a clear code is followed by a gap (see below)
- if the bits of the clear code don't end on a byte boundary you pad it with zeroed bits
- the follows a number of zeroed bytes, such that:
- gzip computes the next position as
posbits = ((posbits-1) + ((n_bits<<,3)-(posbits-1+(n_bits<<,3))%(n_bits<<,3)))
- if the clear code is 16 bits long, then this means: round up the number of tokens to the next multiple of 8 ...
- ... and insert as many zeroed tokens. Since the clear code consumed 16 bits, the following "zero tokens" consume 16 bits each, too
- pseudocode:
int numZeroBytes = 8 - (numTokens % 8); if (numZeroBytes == 8) numZeroBytes = 0; numZeroBytes *= 2;
- note: all of this is only true for 16-bits clear codes !
software | size | parameters | algorithm | comment |
---|---|---|---|---|
ncompress | 443,200,353 bytes | (none) | Lempel-Ziv-Welch | quite fast: 19s |
flexiGIF 2018.10b | 412,450,465 bytes | -a=2048 -t=250000 |
Lempel-Ziv-Welch with flexible dictionary restarts | -a=1 -t=600000 would be better but even slower |
gzip | 322,591,995 bytes | -9 |
Lempel-Ziv 77 with Huffman | second fastest: 51 seconds |
cmix v16 | 116,912,035 bytes | (none) | context mixing with dictionary | slowest by far ... |
Source Code
Git users: scroll down to the repository link Latest release: October 11, 2018, size: 4377 bytes, 121 linesCRC32:
8be33dc9
MD5:
cf2d04852fa032fc30f44080597ace73
SHA1:
6b1d1c95e9a48ef3ba04a1e33e62dfd33f6556aa
SHA256:
e0b18f27821a019e8147cd3d6faaea0435942acc99da5076b0284e88dd132110
Latest release: November 19, 2018, size: 18.8 kBytes, 626 lines
CRC32:
335f281b
MD5:
d4cbaa014f7d4203959a8d93975fe770
SHA1:
96e90380bc41d077efe039eed633aedbbbeee57e
SHA256:
78f94144a07a13a19f3d9205219b7c3b334824b19fddaeb1386c5c8d94e1e51f
Git users: scroll down to the repository link Latest release: October 11, 2018, size: 2546 bytes, 68 lines
CRC32:
bedda00f
MD5:
e2cc1576e653d4ab0a1ad5ef9ff79ffe
SHA1:
9abe6ae05c9968d160428565a4a6e3539ac77fb8
SHA256:
70a25c3872ce0748736cb3ef676a853a6124010803872df2dcc807d480fcf933
Latest release: October 11, 2018, size: 10.6 kBytes, 377 lines
CRC32:
8f14f6db
MD5:
46b8e9661054f43c22355c70254c27fd
SHA1:
e0eb4a5a949c8a94dc7f3c1f66f986db71f7b399
SHA256:
9177252d37f7dcefeb5e95d6f0c34671364d4a3ead2e4946a4872c3fb68f3f6c
Latest release: November 19, 2018, size: 4255 bytes, 148 lines
CRC32:
872a0593
MD5:
544df5e523ee2d9bcff4d52960ba144a
SHA1:
93079254e0fa96328f8c8dfbcfa2c6a7a1b3804b
SHA256:
897562b796342aecc5b3736b3d893986adf162292d3287aed4188b28b79bb9a9
Latest release: November 19, 2018, size: 14.8 kBytes, 538 lines
CRC32:
e9573c40
MD5:
e0eb0277b07310e54934bbcafba9dd4c
SHA1:
ad62344745727974aafca21dc47f236ad9618647
SHA256:
1544fe9bfdf9b1d6065213da8b2f6f11120d9010d865784a3324256a75cc6559
Latest release: October 10, 2018, size: 1451 bytes, 53 lines
CRC32:
531c4542
MD5:
e18b2de01f73ac452fe66e9f23b625ac
SHA1:
dbdd400a8c249012f9fa1255e95ad4e59f34791b
SHA256:
f0e0f892b2fe6f5fc07e6d58f1969dd8e6d67efcf42e0c7564f0eddb290e2bec
Latest release: October 10, 2018, size: 3870 bytes, 139 lines
CRC32:
6c8dfcde
MD5:
92d0a9d1f5a2e1c17b0347f49dac5633
SHA1:
308d806ff25f113a842af35d597d29840f44c996
SHA256:
b96f454ff1116ab63b1b9869938274966dd500269f1ab9fc070436e993272942
Latest release: October 2, 2018, size: 1868 bytes, 62 lines
CRC32:
c5da8645
MD5:
ecc08bd55f3f1477d6009e20d4159efc
SHA1:
a404d9b73583dc48fe020c9a244e8b8a370535c9
SHA256:
86dcb88c95b1d013dd22434b714d543b7be4881daac618ed4c47e3267e9ffc9c
Latest release: October 2, 2018, size: 3171 bytes, 142 lines
CRC32:
1d06db66
MD5:
d98e08e0c4b18fb5af55cec3e033d452
SHA1:
faa67b439cb5112701ca3b40faf8d27720c848b9
SHA256:
c0e9cfcd337e640ab1b1090120c5a94d876a7ecf4650a93b4ae5390678c173ab
Latest release: November 19, 2018, size: 30.0 kBytes, 919 lines
CRC32:
36d234ea
MD5:
b9bd5573f33a98612a4e65416e61f2d7
SHA1:
0c8bfa5019452655c033c806e6578e3cf5d2cd59
SHA256:
8fa9f225149d14a581f850d3d1a9251f7594fdf2319e7817eade87ae751b58ed
Latest release: October 15, 2018, size: 606 bytes, 27 lines
CRC32:
c045e2f9
MD5:
f9bd64ce955a8ee362b62d0aa8a51817
SHA1:
7da5fc420dce911c63557ae1e0f1013f90cf0270
SHA256:
243e559fb179b9c90ff62af8aac472b79091811d81849f3832ed0d40dade9e1d
Stay up-to-date:git clone https://create.stephan-brumme.com/flexigif-lossless-gif-lzw-optimization/git
If you encounter any bugs/problems or have ideas for improving future versions, please write me an email: create@stephan-brumme.com Latest release: October 15, 2018, size: 9.5 kBytes, 192 lines
CRC32:
8be9978a
MD5:
f87f9cd74a08a2db277c2a6d1c78930b
SHA1:
dee2fe0017528b4f8f50f45dfaa9117fd1e1154b
SHA256:
68b2603797570617e267bd234721b7aaaf82dce687e7255bcdf3398ee817281c
Stay up-to-date:git clone https://create.stephan-brumme.com/flexigif-lossless-gif-lzw-optimization/git
If you encounter any bugs/problems or have ideas for improving future versions, please write me an email: create@stephan-brumme.com