smalLZ4 - optimal LZ4 compression

posted August 17, 2016 by Stephan Brumme, updated April 13, 2020

You came here to grab just the code or a binary ? Scroll down to the Download section and GIT repository.
A short overview how smallz4 compares to the original LZ4 and a program called LZ4X (which follows similar ideas as smallz4) can be found here.

And if you are interesting in the inner working, keep on reading and enjoy the ride ...

Updates

version 1.1 (August 2, 2018):
I improved the compression ratio by better handling of block boundaries (reduced test file enwik9.lz4 by 647 bytes).
Unfortunately, there was a bug in the header of uncompressed blocks - it's fixed.

version 1.2 (August 3, 2018):
Experimental support for dictionary compression

version 1.2.1 (August 21, 2018):
Greatly reduced size of the portable, statically compiled smallz4cat binaries - by using diet libc on Linux (only 7kb now !) and by replacing fprintf with fputs.
The smallz4cat Windows binary shrinks by about 20% due to the fputs trick, too.
The compression algorithm remained completely unchanged.

version 1.3 (November 14, 2018):
Support for LZ4 legacy format (slightly smaller output if input < 8 MByte)
Matches with length 19+255x had a wrong cost estimation so that they were often ignored during optimization.

version 1.3.1 (April 25, 2019):
Slightly faster (both encoding and decoding) and more code comments. Windows binaries now compiled with Visual C++ 2019 Community.

version 1.4 (December 18, 2019):
A user-defined pointer can be supplied which is forwarded to read and write functions. This allows to avoid nasty global variables.
Performance is completely unaffected, therefore I didn't upload binaries for version 1.4.

version 1.5 (April 13, 2020):
Fixed 64-bit bugs affecting files over 2GB.

Optimal Parsing

In the context of compression, optimal parsing is a multi-pass algorithm to find the smallest output of compressible data.
Whenever there are multiple choices of encoding a symbol (or a sequence of symbols), optimal parsing will choose the representation which leads to the overall smallest output.
That means, it may prefer a locally sub-optimal encoding in order to achieve a globally optimal encoding.

Most compression algorithms strives for locally optimal encoding because it's computationally cheaper.
The most basic algorithm, the greedy algorithm, always selects the encoding that fits best "right now".
A significant improvement is called "lazy evaluation" and takes the next group of symbols into consideration and decides which encoding yields the smallest output for the whole group.
In a way, lazy evaluation can be seen as a local version of optimal parsing. It does the same job for a group of symbols instead of the whole file.

Compressing the string abcde_bcdefgh_abcdefghxxxxxxx returns different file sizes:

    
     $ echo "abcde_bcdefgh_abcdefghxxxxxxx" | lz4 -9 --no-frame-crc | wc -c
     43
     $ echo "abcde_bcdefgh_abcdefghxxxxxxx" | ./smallz4 | wc -c
     41

Let's assume that red characters are literals, i.e. uncompressed data.
Green pairs of numbers indicate distance and length of a match.
And let's ignore xxxxxxx from now on (it's only purpose is to hide the fact that the LZ4 format specification forbids matches at the very end of a file).

lz4's result in detail: abcde_(5,4)fgh_(14,5)fghxxxxxxx
It found two matches: The first replaced bcde by a reference to a sequence we saw 5 bytes ago.
That reference "costs" 3 bytes while the four literals would have occupied 4 bytes. We saved a byte here.
The second match replaces abcde by a reference to the beginning of the file.
Again, we need 3 bytes to represent the match while the replaced text was 5 bytes long. Two more bytes saved !

However, this compression is not optimal. smallz4 can squeeze out another two bytes because it postpones the second match:
abcde_(5,4)fgh_a(9,7)xxxxxxx
smallz4 detects that matching abcde would save 2 bytes but matching bcdefgh saves 4 bytes.

Optimal Parsing - Pass 1

For each byte of input data, the longest match will be found and stored in an array called matches (starts at about line 560).
Additional data structures such as previousHash and previousExact only speed up the code.
A simple brute-force routine will give you the same result - albeit slower.

position	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	(padding)
character	`a`	`b`	`c`	`d`	`e`	`_`	`b`	`c`	`d`	`e`	`f`	`g`	`h`	`_`	`a`	`b`	`c`	`d`	`e`	`f`	`g`	`h`	`xxxxxxx`
match distance							5	(5)	(5)	(5)				(8)	14	9	9	9	9	9	(9)	(9)
match length							4	(3)	(2)	(1)				(1)	5	7	6	5	4	(3)	(2)	(1)

Numbers in brackets indicates matches which cannot be efficiently compressed by LZ4 - actually each match is required to be at least four characters long.
My program discards these short matches; they aren't stored in matches.

Match length 1 isn't shown in the table above, it's the default value for literals.
Each empty cell in the "match length" row is considered to be 1, with its corresponding match distance being undefined.

Optimal Parsing - Pass 2

In pass 2, all matches are analyzed in reverse. Beginning with the last match we compute how big the compressed output will be - that's what I call cost.
You may ask: why in reverse ? Well, the idea ("dynamic programming") is to reduce the problem into a smaller one.

The most simple problem is a single byte. Obviously, it can't be LZ4 compressed.
When taking a few more steps we will arrive at a position where a match was found in the previous pass.
Now we have to decide whether this match is more efficient ("costs less") than outputting its literals.
It boils down to a decision whether cost(match) + cost(everything after the match) is smaller than cost(from here on).

Usually the match is always "cheaper" than literals. The case becomes much more interesting if the longest possible match overlaps with a potential match found later in the text.
That's exactly what we saw in the example given above. Therefore we compute the cost for all potential match length from 4 (LZ4's minimum match length) up to the longest match found.
In conclusion we look for the minimum of cost(match with length x) + cost(everything after x bytes).

The relevant code can be found in estimateCosts. It's only parameter matches is modified such that each locally optimal match is replaced by its globally optimal sibling.
Often, those two are identical but, as explained, a shorter match (which is locally sub-optimal) can sometimes yield a better output.
Reducing match length below 4 converts a match into a literal.

Obviously, literals take 1 byte. Therefore, the cost of a literal will be 1 plus whatever follows: cost[i] = cost[i + 1] + 1

The basic cost of a match is 1+2=3 bytes: the intial token consumes 1 byte plus 2 bytes to encode the match distance.
However, we can skip over the next position: cost[i] = cost[i + matchLength] + 1 + 2

The optimal match turns out to be the one with the lowest cost[i] when we vary matchLength between 1 and its original value:

position	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	(padding)
character	`a`	`b`	`c`	`d`	`e`	`_`	`b`	`c`	`d`	`e`	`f`	`g`	`h`	`_`	`a`	`b`	`c`	`d`	`e`	`f`	`g`	`h`	`xxxxxxx`
match distance							5								14	9	9	9	9
match length							4								5	7	6	5	4
cost	19	18	17	16	15	14	13	13	12	11	10	9	8	7	6	3	3	3	3	3	2	1
minimum cost	17	16	15	14	13	12	11	11	10	9	8	7	6	5	4	3	3	3	3	3	2	1
optimal length							4								1	7	6	5	4

In the example above estimateCosts will figure out that a match at position 15 is more expensive than a literal:
Match: cost[15] = cost[15 + matchLength] + 1 + 2 = cost[19] + 3 = 6 (see row "cost")
Literal: cost[15] = cost[15 + 1] + 1 = cost[16] + 1 = 4 (see row "minimum cost")

Therefore, match[15].length will be changed from 5 to 1 (its optimal length).

Note: there are a few additional adjustments to these formulas in the full source code.
The 15th consecutive literal actually costs 1+1=2 bytes because we have to insert an extra length byte (and after that plus 1 for each 255th byte).
Very long matches take more byte to encode the length (plus 1 after 19 bytes and plus 1 for each 255th thereafter).

Optimal Parsing - Pass 3

Finally, we walk through the matches from the beginning to the end and emit the corresponding LZ4 codes.
selectBestMatches is implemented pretty straightforward and can be summarized as follows:

hide pseudo code


     i = 0
     while (i < matches.size())
     {
       if (matches[i] is a literal)
       {
         add literal to an internal buffer
         i++
       }
       else
       {
         emit token (which contains match length and number of buffered literals)
         emit all buffered literals and clear that buffer
         emit match distance
         i += matches[i].length
       }
     }
     // if literals buffer isn't empty, emit it, too

Dictionaries (v1.2+)

The first byte of any file can't be LZ4 compressed - because there is nothing that can be referenced.
The LZ4 sliding window slowly fills with data when a file is processed, thus "newer data" can reference "older data" and then compression ratio usually improves.
This can be a severely limiting factor for very small files - there is just too little history ("old data") to achieve a proper compression.

LZ4 officially introduced dictionaries in version 1.8.1. A LZ4 dictionary is a plain file with no special format at all. Its contents is just used as the initial sliding window.
For example: HTML5 code should start with <!doctype html>. A good LZ4 dictionary for HTML5 files should contain those bytes.
It's important to note that both the encoder AND the decoder need to have the same dictionary.

The command-line parameter -D FILE loads the dictionary named FILE. As mentioned before, that parameter (and the same file) must be used for the encoder AND the decoder as well.
A dictionary larger than 64k doesn't make any sense since LZ4's sliding window is 64k large.
Dictionaries are only relevant for the first 64k of a compressed file: once a dictionary is loaded into the sliding window and we slowly process incoming data,
old data is slowly shifted out of the sliding window. If your dictionary is 40k then the dictionary's first bytes will disappear from the sliding window after you processed 64k-40k=24k data.

Having such a simple file format, dictionaries can be generated in various ways:

by hand
just extracting the first bytes of a typical file
running a simple statistical analysis on the most frequent phrases

It's impossible to improve compression of a single file by creating a "perfect" dictionary exclusively for this file. The overhead of the dictionary always exceeds the savings.
However, compressing multiple similar files might observe large savings.

Live Demo

You can compress the latest source code file smallz4.h, smallz4.cpp and smallz4cat.c on-the-fly.
Anything entered into the textbox will be used as a dictionary:

File	Original Size	Compressed Size		With Dictionary
`smallz4.h`	bytes	bytes	bytes	bytes	bytes
`smallz4.cpp`	bytes	bytes	bytes	bytes	bytes
`smallz4cat.c`	bytes	bytes	bytes	bytes	bytes

LZ4 compressor (C++)

Features:

optimal parsing → smaller files than lz4hc
output is fully LZ4 compatible
flexible I/O interface
limited memory consumption (max. 64 MByte), independent of input size
header-only library: just include a single file without any macros, #ifdefs and compiler-specific stuff
optional: user-defined compression levels (0 - uncompressed ... 9 - optimal parsing)
performance: about 5 MByte per second for optimal parsing (Core i7 @ 3.4 GHz, enwik9 test file)
a few workarounds for degenerated data, such as extremely long sequences of repeating characters
fall back to uncompressed mode if compressed data actually takes more space than uncompressed data
lots of code comments: you can actually understand what's going on ☺

What's missing:

XXHash checksums - see here for a simple implementation
multi-threaded compression

Comparison chart (compressed test file is enwik9, which is the first 1 GiByte of an English Wikipedia snapshot, download it here):

compressor	settings	bytes	duration
lz4 1.9.0	(default)	509,454,838 bytes	3 seconds
lz4 1.9.0	`-9`	374,839,215 bytes	39 seconds
lz4 1.9.0	`-9 --no-frame-crc -BD`	374,085,998 bytes	39 seconds
BriefLZ 0.2.0	`--optimal`	372,068,437 bytes	127 seconds
LZ4X 1.60	`-9`	372,068,437 bytes	104 seconds
lz4ultra 1.2.2	`-B7 -BD`	371,687,239 bytes	291 seconds
lz4 1.9.0	`-12 --no-frame-crc -BD`	371,680,440 bytes	72 seconds
smallz4 1.3	`-9`	371,680,328 bytes	262 seconds

Note: LZ4X and BriefLZ are based on the deprecated and less efficient LZ4 legacy format frames (same as smallz4 -l).
Note: LZ4 improved the -12 algorithm significantly in version 1.8.1 and it produced smaller files than smallz4 1.0. However, my updated smallz4 1.1 exceeds LZ4's compression ratio.

hide Compressor


     // //////////////////////////////////////////////////////////
     // smallz4.h
     // Copyright (c) 2016-2020 Stephan Brumme. All rights reserved.
     // see https://create.stephan-brumme.com/smallz4/
     //
     // "MIT License":
     // Permission is hereby granted, free of charge, to any person obtaining a copy
     // of this software and associated documentation files (the "Software"),
     // to deal in the Software without restriction, including without limitation
     // the rights to use, copy, modify, merge, publish, distribute, sublicense,
     // and/or sell copies of the Software, and to permit persons to whom the Software
     // is furnished to do so, subject to the following conditions:
     //
     // The above copyright notice and this permission notice shall be included
     // in all copies or substantial portions of the Software.
     //
     // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
     // INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
     // PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
     // HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
     // OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
     // SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
     
     #pragma once
     
     #include <inttypes.h> // uint16_t, uint32_t, ...
     #include <cstdlib>    // size_t
     #include <vector>
     
     /// LZ4 compression with optimal parsing
     /** see smallz4.cpp for a basic I/O interface
         you can easily replace it by a in-memory version
         then all you have to do is:
         #include "smallz4.h"
         smallz4::lz4(GET_BYTES, SEND_BYTES);
         // for more advanced stuff, you can call lz4 with up to four parameters (incl. max chain length and a dictionary)
     **/
     class smallz4
     {
     public:
       // read  several bytes, see getBytesFromIn() in smallz4.cpp for a basic implementation
       typedef size_t (*GET_BYTES) (      void* data, size_t numBytes, void* userPtr);
       // write several bytes, see sendBytesToOut() in smallz4.cpp for a basic implementation
       typedef void   (*SEND_BYTES)(const void* data, size_t numBytes, void* userPtr);
     
       /// compress everything in input stream (accessed via getByte) and write to output stream (via send)
       static void lz4(GET_BYTES getBytes, SEND_BYTES sendBytes,
                       unsigned short maxChainLength = MaxChainLength,
                       bool useLegacyFormat = false,
                       void* userPtr = NULL)
       {
         lz4(getBytes, sendBytes, maxChainLength, std::vector<unsigned char>(), useLegacyFormat, userPtr);
       }
     
       /// compress everything in input stream (accessed via getByte) and write to output stream (via send)
       static void lz4(GET_BYTES getBytes, SEND_BYTES sendBytes,
                       unsigned short maxChainLength,
                       const std::vector<unsigned char>& dictionary, // predefined dictionary
                       bool useLegacyFormat = false,                 // old format is 7 bytes smaller if input < 8 MB
                       void* userPtr = NULL)
       {
         smallz4 obj(maxChainLength);
         obj.compress(getBytes, sendBytes, dictionary, useLegacyFormat, userPtr);
       }
     
       /// version string
       static const char* const getVersion()
       {
         return "1.5";
       }
     
     
       // compression level thresholds, made public because I display them in the help screen ...
       enum
       {
         /// greedy mode for short chains (compression level <= 3) instead of optimal parsing / lazy evaluation
         ShortChainsGreedy = 3,
         /// lazy evaluation for medium-sized chains (compression level > 3 and <= 6)
         ShortChainsLazy   = 6
       };
     
       // ----- END OF PUBLIC INTERFACE -----
     private:
     
       // ----- constants and types -----
     
       /// a block can be up to 4 MB, so uint32_t would suffice but uint64_t is quite a bit faster on my x64 machine
       typedef uint64_t Length;
       /// matches must start within the most recent 64k
       typedef uint16_t Distance;
     
       enum
       {
         /// each match's length must be >= 4
         MinMatch          =  4,
         /// a literal needs one byte
         JustLiteral       =  1,
         /// last match must not be closer than 12 bytes to the end
         BlockEndNoMatch   = 12,
         /// last 5 bytes must be literals, no matching allowed
         BlockEndLiterals  =  5,
     
         /// match finder's hash table size (2^HashBits entries, must be less than 32)
         HashBits          = 20,
         HashSize          = 1 << HashBits,
     
         /// input buffer size, can be any number but zero ;-)
         BufferSize        = 64*1024,
     
         /// maximum match distance, must be power of 2 minus 1
         MaxDistance       =   65535,
         /// marker for "no match"
         EndOfChain        =       0,
         /// stop match finding after MaxChainLength steps (default is unlimited => optimal parsing)
         MaxChainLength    = MaxDistance,
     
         /// significantly speed up parsing if the same byte is repeated a lot, may cause sub-optimal compression
         MaxSameLetter     =   19 + 255*256, // was: 19 + 255,
     
         /// maximum block size as defined in LZ4 spec: { 0,0,0,0,64*1024,256*1024,1024*1024,4*1024*1024 }
         /// I only work with the biggest maximum block size (7)
         //  note: xxhash header checksum is precalculated only for 7, too
         MaxBlockSizeId    = 7,
         MaxBlockSize      = 4*1024*1024,
     
         /// legacy format has a fixed block size of 8 MB
         MaxBlockSizeLegacy = 8*1024*1024,
     
         /// number of literals and match length is encoded in several bytes, max. 255 per byte
         MaxLengthCode = 255
       };
     
       //  ----- one and only variable ... -----
     
       /// how many matches are checked in findLongestMatch, lower values yield faster encoding at the cost of worse compression ratio
       unsigned short maxChainLength;
     
       //  ----- code -----
     
       /// match
       struct Match
       {
         /// length of match
         Length   length;
         /// start of match
         Distance distance;
       };
     
     
       /// create new compressor (only invoked by lz4)
       explicit smallz4(unsigned short newMaxChainLength = MaxChainLength)
       : maxChainLength(newMaxChainLength) // => no limit, but can be changed by setMaxChainLength
       {
       }
     
     
       /// return true, if the four bytes at *a and *b match
       inline static bool match4(const void* const a, const void* const b)
       {
         return *(const uint32_t*) a == *(const uint32_t*) b;
       }
     
       /// simple hash function, input: 32 bits, output: HashBits bits (by default: 20)
       inline static uint32_t getHash32(uint32_t fourBytes)
       {
         // taken from https://en.wikipedia.org/wiki/Linear_congruential_generator
         const uint32_t HashMultiplier = 48271;
         return ((fourBytes * HashMultiplier) >> (32 - HashBits)) & (HashSize - 1);
       }
     
     
       /// find longest match of data[pos] between data[begin] and data[end], use match chain
       Match findLongestMatch(const unsigned char* const data,
                              uint64_t pos, uint64_t begin, uint64_t end,
                              const Distance* const chain) const
       {
         Match result;
         result.length = JustLiteral; // assume a literal => one byte
     
         // compression level: look only at the first n entries of the match chain
         unsigned short stepsLeft = maxChainLength;
         // findLongestMatch() shouldn't be called when maxChainLength = 0 (uncompressed)
     
         // pointer to position that is currently analyzed (which we try to find a great match for)
         const unsigned char* const current = data    + pos - begin;
         // don't match beyond this point
         const unsigned char* const stop    = current + end - pos;
     
         // get distance to previous match, abort if 0 => not existing
         Distance distance = chain[pos & MaxDistance];
         int64_t totalDistance = 0;
         while (distance != EndOfChain)
         {
           // chain goes too far back ?
           totalDistance += distance;
           if (totalDistance > MaxDistance)
             break; // can't match beyond 64k
     
           // prepare next position
           distance = chain[(pos - totalDistance) & MaxDistance];
     
           // let's introduce a new pointer atLeast that points to the first "new" byte of a potential longer match
           const unsigned char* const atLeast = current + result.length + 1;
           // impossible to find a longer match because not enough bytes left ?
           if (atLeast > stop)
             break;
     
           // the idea is to split the comparison algorithm into 2 phases
           // (1) scan backward from atLeast to current, abort if mismatch
           // (2) scan forward  until a mismatch is found and store length/distance of this new best match
           // current                  atLeast
           //    |                        |
           //    -<<<<<<<< phase 1 <<<<<<<<
           //                              >>> phase 2 >>>
           // main reason for phase 1:
           // - both byte sequences start with the same bytes, quite likely they are very similar
           // - there is a good chance that if they differ, then their last bytes differ
           // => checking the last first increases the probability that a mismatch is detected as early as possible
     
           // compare 4 bytes at once
           const Length CheckAtOnce = 4;
     
           // all bytes between current and atLeast shall be identical
           const unsigned char* phase1 = atLeast - CheckAtOnce; // minus 4 because match4 checks 4 bytes
           while (phase1 > current && match4(phase1, phase1 - totalDistance))
             phase1 -= CheckAtOnce;
           // note: - the first four bytes always match
           //       - in the last iteration, phase1 points either at current + 1 or current + 2 or current + 3
           //       - therefore we compare a few bytes twice => but a check to skip these checks is more expensive
     
           // mismatch ? (the while-loop was aborted)
           if (phase1 > current)
             continue;
     
           // we have a new best match, now scan forward
           const unsigned char* phase2 = atLeast;
     
           // fast loop: check four bytes at once
           while (phase2 + CheckAtOnce <= stop && match4(phase2,     phase2 - totalDistance))
             phase2 += CheckAtOnce;
           // slow loop: check the last 1/2/3 bytes
           while (phase2               <  stop &&       *phase2 == *(phase2 - totalDistance))
             phase2++;
     
           // store new best match
           result.distance = Distance(totalDistance);
           result.length   = Length  (phase2 - current);
     
           // stop searching on lower compression levels
           if (--stepsLeft == 0)
             break;
         }
     
         return result;
       }
     
       /// create shortest output
       /** data points to block's begin; we need it to extract literals **/
       static std::vector<unsigned char> selectBestMatches(const std::vector<Match>& matches,
                                                           const unsigned char* const data)
       {
         // store encoded data
         std::vector<unsigned char> result;
         result.reserve(matches.size());
     
         // indices of current run of literals
         size_t literalsFrom = 0;
         size_t numLiterals  = 0;
     
         bool lastToken = false;
     
         // walk through the whole block
         for (size_t offset = 0; offset < matches.size(); ) // increment inside of loop
         {
           // get best cost-weighted match
           Match match = matches[offset];
     
           // if no match, then count literals instead
           if (match.length <= JustLiteral)
           {
             // first literal ? need to reset pointers of current sequence of literals
             if (numLiterals == 0)
               literalsFrom = offset;
     
             // add one more literal to current sequence
             numLiterals++;
     
             // next match
             offset++;
     
             // continue unless it's the last literal
             if (offset < matches.size())
               continue;
     
             lastToken = true;
           }
           else
           {
             // skip unused matches
             offset += match.length;
           }
     
           // store match length (4 is implied because it's the minimum match length)
           int matchLength = int(match.length) - MinMatch;
     
           // last token has zero length
           if (lastToken)
             matchLength = 0;
     
           // token consists of match length and number of literals, let's start with match length ...
           unsigned char token = (matchLength < 15) ? (unsigned char)matchLength : 15;
     
           // >= 15 literals ? (extra bytes to store length)
           if (numLiterals < 15)
           {
             // add number of literals in higher four bits
             token |= numLiterals << 4;
             result.push_back(token);
           }
           else
           {
             // set all higher four bits, the following bytes with determine the exact number of literals
             result.push_back(token | 0xF0);
     
             // 15 is already encoded in token
             int encodeNumLiterals = int(numLiterals) - 15;
     
             // emit 255 until remainder is below 255
             while (encodeNumLiterals >= MaxLengthCode)
             {
               result.push_back(MaxLengthCode);
               encodeNumLiterals -= MaxLengthCode;
             }
             // and the last byte (can be zero, too)
             result.push_back((unsigned char)encodeNumLiterals);
           }
           // copy literals
           if (numLiterals > 0)
           {
             result.insert(result.end(), data + literalsFrom, data + literalsFrom + numLiterals);
     
             // last token doesn't have a match
             if (lastToken)
               break;
     
             // reset
             numLiterals = 0;
           }
     
           // distance stored in 16 bits / little endian
           result.push_back(match.distance & 0xFF);
           result.push_back(match.distance >> 8);
     
           // >= 15+4 bytes matched
           if (matchLength >= 15)
           {
             // 15 is already encoded in token
             matchLength -= 15;
             // emit 255 until remainder is below 255
             while (matchLength >= MaxLengthCode)
             {
               result.push_back(MaxLengthCode);
               matchLength -= MaxLengthCode;
             }
             // and the last byte (can be zero, too)
             result.push_back((unsigned char)matchLength);
           }
         }
     
         return result;
       }
     
     
       /// walk backwards through all matches and compute number of compressed bytes from current position to the end of the block
       /** note: matches are modified (shortened length) if necessary **/
       static void estimateCosts(std::vector<Match>& matches)
       {
         const size_t blockEnd = matches.size();
     
         // equals the number of bytes after compression
         typedef uint32_t Cost;
         // minimum cost from this position to the end of the current block
         std::vector<Cost> cost(matches.size(), 0);
         // "cost" represents the number of bytes needed
     
         // the last bytes must always be literals
         Length numLiterals = BlockEndLiterals;
         // backwards optimal parsing
         for (int64_t i = (int64_t)blockEnd - (1 + BlockEndLiterals); i >= 0; i--) // ignore the last 5 bytes, they are always literals
         {
           // if encoded as a literal
           numLiterals++;
           Length bestLength = JustLiteral;
           // such a literal "costs" 1 byte
           Cost   minCost = cost[i + 1] + JustLiteral;
     
           // an extra length byte is required for every 255 literals
           if (numLiterals >= 15)
           {
             // same as: if ((numLiterals - 15) % MaxLengthCode == 0)
             // but I try hard to avoid the slow modulo function
             if (numLiterals == 15 || (numLiterals >= 15 + MaxLengthCode && (numLiterals - 15) % MaxLengthCode == 0))
               minCost++;
           }
     
           // let's look at the longest match, almost always more efficient that the plain literals
           Match match = matches[i];
     
           // very long self-referencing matches can slow down the program A LOT
           if (match.length >= MaxSameLetter && match.distance == 1)
           {
             // assume that longest match is always the best match
             // NOTE: this assumption might not be optimal !
             bestLength = match.length;
             minCost    = cost[i + match.length] + 1 + 2 + 1 + Cost(match.length - 19) / 255;
           }
           else
           {
             // this is the core optimization loop
     
             // overhead of encoding a match: token (1 byte) + offset (2 bytes) + sometimes extra bytes for long matches
             Cost   extraCost = 1 + 2;
             Length nextCostIncrease = 18; // need one more byte for 19+ long matches (next increase: 19+255*x)
     
             // try all match lengths (start with short ones)
             for (Length length = MinMatch; length <= match.length; length++)
             {
               // token (1 byte) + offset (2 bytes) + extra bytes for long matches
               Cost currentCost = cost[i + length] + extraCost;
               // better choice ?
               if (currentCost <= minCost)
               {
                 // regarding the if-condition:
                 // "<"  prefers literals and shorter matches
                 // "<=" prefers longer matches
                 // they should produce the same number of bytes (because of the same cost)
                 // ... but every now and then it doesn't !
                 // that's why: too many consecutive literals require an extra length byte
                 // (which we took into consideration a few lines above)
                 // but we only looked at literals beyond the current position
                 // if there are many literal in front of the current position
                 // then it may be better to emit a match with the same cost as the literals at the current position
                 // => it "breaks" the long chain of literals and removes the extra length byte
                 minCost    = currentCost;
                 bestLength = length;
                 // performance-wise, a long match is usually faster during decoding than multiple short matches
                 // on the other hand, literals are faster than short matches as well (assuming same cost)
               }
     
               // very long matches need extra bytes for encoding match length
               if (length == nextCostIncrease)
               {
                 extraCost++;
                 nextCostIncrease += MaxLengthCode;
               }
             }
           }
     
           // store lowest cost so far
           cost[i] = minCost;
     
           // and adjust best match
           matches[i].length = bestLength;
     
           // reset number of literals if a match was chosen
           if (bestLength != JustLiteral)
             numLiterals = 0;
     
           // note: if bestLength is smaller than the previous matches[i].length then there might be a closer match
           //       which could be more cache-friendly (=> faster decoding)
         }
       }
     
     
       /// compress everything in input stream (accessed via getByte) and write to output stream (via send), improve compression with a predefined dictionary
       void compress(GET_BYTES getBytes, SEND_BYTES sendBytes, const std::vector<unsigned char>& dictionary, bool useLegacyFormat, void* userPtr) const
       {
         // ==================== write header ====================
         if (useLegacyFormat)
         {
           // magic bytes
           const unsigned char header[] = { 0x02, 0x21, 0x4C, 0x18 };
           sendBytes(header, sizeof(header), userPtr);
         }
         else
         {
           // frame header
           const unsigned char header[] =
           {
             0x04, 0x22, 0x4D, 0x18, // magic bytes
             1 << 6,                 // flags: no checksums, blocks depend on each other and no dictionary ID
             MaxBlockSizeId << 4,    // max blocksize
             0xDF                    // header checksum (precomputed)
           };
           sendBytes(header, sizeof(header), userPtr);
         }
     
         // ==================== declarations ====================
         // change read buffer size as you like
         unsigned char buffer[BufferSize];
     
         // read the file in chunks/blocks, data will contain only bytes which are relevant for the current block
         std::vector<unsigned char> data;
     
         // file position corresponding to data[0]
         size_t dataZero = 0;
         // last already read position
         size_t numRead  = 0;
     
         // passthru data ? (but still wrap it in LZ4 format)
         const bool uncompressed = (maxChainLength == 0);
     
         // last time we saw a hash
         const uint64_t NoLastHash = ~0; // = -1
         std::vector<uint64_t> lastHash(HashSize, NoLastHash);
     
         // previous position which starts with the same bytes
         std::vector<Distance> previousHash (MaxDistance + 1, Distance(EndOfChain)); // long chains based on my simple hash
         std::vector<Distance> previousExact(MaxDistance + 1, Distance(EndOfChain)); // shorter chains based on exact matching of the first four bytes
         // these two containers are essential for match finding:
         // 1. I compute a hash of four byte
         // 2. in lastHash is the location of the most recent block of four byte with that same hash
         // 3. due to hash collisions, several groups of four bytes may yield the same hash
         // 4. so for each location I can look up the previous location of the same hash in previousHash
         // 5. basically it's a chain of memory locations where potential matches start
         // 5. I follow this hash chain until I find exactly the same four bytes I was looking for
         // 6. then I switch to a sparser chain: previousExact
         // 7. it's basically the same idea as previousHash but this time not the hash but the first four bytes must be identical
         // 8. previousExact will be used by findLongestMatch: it compare all such strings a figures out which is the longest match
     
         // And why do I have to do it in such a complicated way ?
         // - well, there are 2^32 combinations of four bytes
         // - so that there are 2^32 potential chains
         // - most combinations just don't occur and occupy no space but I still have to keep their "entry point" (which are empty/invalid)
         // - that would be at least 16 GBytes RAM (2^32 x 4 bytes)
         // - my hashing algorithm reduces the 2^32 combinations to 2^20 hashes (see hashBits), that's about 8 MBytes RAM
         // - thus only 2^20 entry points and at most 2^20 hash chains which is easily manageable
         // ... in the end it's all about conserving memory !
         // (total memory consumption of smallz4 is about 64 MBytes)
     
         // first and last offset of a block (nextBlock is end-of-block plus 1)
         uint64_t lastBlock = 0;
         uint64_t nextBlock = 0;
         bool parseDictionary = !dictionary.empty();
     
         // main loop, processes one block per iteration
         while (true)
         {
           // ==================== start new block ====================
           // first byte of the currently processed block (std::vector data may contain the last 64k of the previous block, too)
           const unsigned char* dataBlock = NULL;
     
           // prepend dictionary
           if (parseDictionary)
           {
             // resize dictionary to 64k (minus 1 because we can only match the last 65535 bytes of the dictionary => MaxDistance)
             if (dictionary.size() < MaxDistance)
             {
               // dictionary is smaller than 64k, prepend garbage data
               size_t unused = MaxDistance - dictionary.size();
               data.resize(unused, 0);
               data.insert(data.end(), dictionary.begin(), dictionary.end());
             }
             else
               // copy only the most recent 64k of the dictionary
               data.insert(data.end(), dictionary.begin() + dictionary.size() - MaxDistance, dictionary.end());
     
             nextBlock = data.size();
             numRead   = data.size();
           }
     
           // read more bytes from input
           size_t maxBlockSize = useLegacyFormat ? MaxBlockSizeLegacy : MaxBlockSize;
           while (numRead - nextBlock < maxBlockSize)
           {
             // buffer can be significantly smaller than MaxBlockSize, that's the only reason for this while-block
             size_t incoming = getBytes(buffer, BufferSize, userPtr);
             // no more data ?
             if (incoming == 0)
               break;
     
             // add bytes to buffer
             numRead += incoming;
             data.insert(data.end(), buffer, buffer + incoming);
           }
     
           // no more data ? => WE'RE DONE !
           if (nextBlock == numRead)
             break;
     
           // determine block borders
           lastBlock  = nextBlock;
           nextBlock += maxBlockSize;
           // not beyond end-of-file
           if (nextBlock > numRead)
             nextBlock = numRead;
     
           // pointer to first byte of the currently processed block (the std::vector container named data may contain the last 64k of the previous block, too)
           dataBlock = &data[lastBlock - dataZero];
     
           const uint64_t blockSize = nextBlock - lastBlock;
     
           // ==================== full match finder ====================
     
           // greedy mode is much faster but produces larger output
           const bool isGreedy =              (maxChainLength <= ShortChainsGreedy);
           // lazy evaluation: if there is a match, then try running match finder on next position, too, but not after that
           const bool isLazy   = !isGreedy && (maxChainLength <= ShortChainsLazy);
           // skip match finding on the next x bytes in greedy mode
           Length skipMatches = 0;
           // allow match finding on the next byte but skip afterwards (in lazy mode)
           bool   lazyEvaluation = false;
     
           // the last literals of the previous block skipped matching, so they are missing from the hash chains
           int64_t lookback = int64_t(dataZero);
           if (lookback > BlockEndNoMatch && !parseDictionary)
             lookback = BlockEndNoMatch;
           if (parseDictionary)
             lookback = int64_t(dictionary.size());
           // so let's go back a few bytes
           lookback = -lookback;
           // ... but not in legacy mode
           if (useLegacyFormat || uncompressed)
             lookback = 0;
     
           std::vector<Match> matches(uncompressed ? 0 : blockSize);
           // find longest matches for each position (skip if level=0 which means "uncompressed")
           int64_t i;
           for (i = lookback; i + BlockEndNoMatch <= int64_t(blockSize) && !uncompressed; i++)
           {
             // detect self-matching
             if (i > 0 && dataBlock[i] == dataBlock[i - 1])
             {
               Match prevMatch = matches[i - 1];
               // predecessor had the same match ?
               if (prevMatch.distance == 1 && prevMatch.length > MaxSameLetter) // TODO: handle very long self-referencing matches
               {
                 // just copy predecessor without further (expensive) optimizations
                 matches[i].distance = 1;
                 matches[i].length   = prevMatch.length - 1;
                 continue;
               }
             }
     
             // read next four bytes
             const uint32_t four = *(uint32_t*)(dataBlock + i);
             // convert to a shorter hash
             const uint32_t hash = getHash32(four);
     
             // get most recent position of this hash
             uint64_t lastHashMatch = lastHash[hash];
             // and store current position
             lastHash[hash] = i + lastBlock;
     
             // remember: i could be negative, too
             Distance prevIndex = (i + MaxDistance + 1) & MaxDistance; // actually the same as i & MaxDistance
     
             // no predecessor / no hash chain available ?
             if (lastHashMatch == NoLastHash)
             {
               previousHash [prevIndex] = EndOfChain;
               previousExact[prevIndex] = EndOfChain;
               continue;
             }
     
             // most recent hash match too far away ?
             uint64_t distance = lastHash[hash] - lastHashMatch;
             if (distance > MaxDistance)
             {
               previousHash [prevIndex] = EndOfChain;
               previousExact[prevIndex] = EndOfChain;
               continue;
             }
     
             // build hash chain, i.e. store distance to last pseudo-match
             previousHash[prevIndex] = (Distance)distance;
     
             // skip pseudo-matches (hash collisions) and build a second chain where the first four bytes must match exactly
             uint32_t currentFour;
             // check the hash chain
             while (true)
             {
               // read four bytes
               currentFour = *(uint32_t*)(&data[lastHashMatch - dataZero]); // match may be found in the previous block, too
               // match chain found, first 4 bytes are identical
               if (currentFour == four)
                 break;
     
               // prevent from accidently hopping on an old, wrong hash chain
               if (hash != getHash32(currentFour))
                 break;
     
               // try next pseudo-match
               Distance next = previousHash[lastHashMatch & MaxDistance];
               // end of the hash chain ?
               if (next == EndOfChain)
                 break;
     
               // too far away ?
               distance += next;
               if (distance > MaxDistance)
                 break;
     
               // take another step along the hash chain ...
               lastHashMatch -= next;
               // closest match is out of range ?
               if (lastHashMatch < dataZero)
                 break;
             }
     
             // search aborted / failed ?
             if (four != currentFour)
             {
               // no matches for the first four bytes
               previousExact[prevIndex] = EndOfChain;
               continue;
             }
     
             // store distance to previous match
             previousExact[prevIndex] = (Distance)distance;
     
             // no matching if crossing block boundary, just update hash tables
             if (i < 0)
               continue;
     
             // skip match finding if in greedy mode
             if (skipMatches > 0)
             {
               skipMatches--;
               if (!lazyEvaluation)
                 continue;
               lazyEvaluation = false;
             }
     
             // and after all that preparation ... finally look for the longest match
             matches[i] = findLongestMatch(data.data(), i + lastBlock, dataZero, nextBlock - BlockEndLiterals, previousExact.data());
     
             // no match finding needed for the next few bytes in greedy/lazy mode
             if ((isLazy || isGreedy) && matches[i].length != JustLiteral)
             {
               lazyEvaluation = (skipMatches == 0);
               skipMatches = matches[i].length;
             }
           }
           // last bytes are always literals
           while (i < int(matches.size()))
             matches[i++].length = JustLiteral;
     
           // dictionary is valid only to the first block
           parseDictionary = false;
     
           // ==================== estimate costs (number of compressed bytes) ====================
     
           // not needed in greedy mode and/or very short blocks
           if (matches.size() > BlockEndNoMatch && maxChainLength > ShortChainsGreedy)
             estimateCosts(matches);
     
           // ==================== select best matches ====================
     
           std::vector<unsigned char> compressed = selectBestMatches(matches, &data[lastBlock - dataZero]);
     
           // ==================== output ====================
     
           // did compression do harm ?
           bool useCompression = compressed.size() < blockSize && !uncompressed;
           // legacy format is always compressed
           useCompression |= useLegacyFormat;
     
           // block size
           uint32_t numBytes = uint32_t(useCompression ? compressed.size() : blockSize);
           uint32_t numBytesTagged = numBytes | (useCompression ? 0 : 0x80000000);
           unsigned char num1 =  numBytesTagged         & 0xFF; sendBytes(&num1, 1, userPtr);
           unsigned char num2 = (numBytesTagged >>  8)  & 0xFF; sendBytes(&num2, 1, userPtr);
           unsigned char num3 = (numBytesTagged >> 16)  & 0xFF; sendBytes(&num3, 1, userPtr);
           unsigned char num4 = (numBytesTagged >> 24)  & 0xFF; sendBytes(&num4, 1, userPtr);
     
           if (useCompression)
             sendBytes(compressed.data(),           numBytes, userPtr);
           else // uncompressed ? => copy input data
             sendBytes(&data[lastBlock - dataZero], numBytes, userPtr);
     
           // legacy format: no matching across blocks
           if (useLegacyFormat)
           {
             dataZero += data.size();
             data.clear();
     
             // clear hash tables
             for (size_t i = 0; i < previousHash .size(); i++)
               previousHash [i] = EndOfChain;
             for (size_t i = 0; i < previousExact.size(); i++)
               previousExact[i] = EndOfChain;
             for (size_t i = 0; i < lastHash.size(); i++)
               lastHash[i] = NoLastHash;
           }
           else
           {
             // remove already processed data except for the last 64kb which could be used for intra-block matches
             if (data.size() > MaxDistance)
             {
               size_t remove = data.size() - MaxDistance;
               dataZero += remove;
               data.erase(data.begin(), data.begin() + remove);
             }
           }
         }
     
         // add an empty block
         if (!useLegacyFormat)
         {
           static const uint32_t zero = 0;
           sendBytes(&zero, 4, userPtr);
         }
       }
     };

Click on "show" to display the code of a small test program:

show Test program


     // //////////////////////////////////////////////////////////
     // smallz4.cpp
     // Copyright (c) 2016-2019 Stephan Brumme. All rights reserved.
     // see https://create.stephan-brumme.com/smallz4/
     //
     // "MIT License":
     // Permission is hereby granted, free of charge, to any person obtaining a copy
     // of this software and associated documentation files (the "Software"),
     // to deal in the Software without restriction, including without limitation
     // the rights to use, copy, modify, merge, publish, distribute, sublicense,
     // and/or sell copies of the Software, and to permit persons to whom the Software
     // is furnished to do so, subject to the following conditions:
     //
     // The above copyright notice and this permission notice shall be included
     // in all copies or substantial portions of the Software.
     //
     // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
     // INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
     // PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
     // HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
     // OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
     // SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
     
     // suppress warnings when compiled by Visual C++
     #define _CRT_SECURE_NO_WARNINGS
     
     #include "smallz4.h"
     
     #include <cstdio>     // stdin/stdout/stderr, fopen, ...
     #include <ctime>      // time (verbose output)
     
     #ifdef _WIN32
       #include <io.h>     // isatty()
     #else
       #include <cstdlib>  // exit
       #include <unistd.h> // isatty()
       #define _fileno fileno
       #define _isatty isatty
     #endif
     
     
     /// error handler
     static void error(const char* msg, int code = 1)
     {
       fprintf(stderr, "ERROR: %s\n", msg);
       exit(code);
     }
     
     
     // ==================== user-specific I/O INTERFACE ====================
     
     struct UserPtr
     {
       // file handles
       FILE* in;
       FILE* out;
       // the attributes below are just needed for verbose output
       bool  verbose;
       uint64_t numBytesIn;
       uint64_t numBytesOut;
       uint64_t totalSize;
       time_t   starttime;
     };
     
     /// read several bytes and store at "data", return number of actually read bytes (return only zero if end of data reached)
     size_t getBytesFromIn(void* data, size_t numBytes, void* userPtr)
     {
       /// cast user-specific data
       UserPtr* user = (UserPtr*)userPtr;
     
       if (data && numBytes > 0)
       {
         size_t actual = fread(data, 1, numBytes, user->in);
         user->numBytesIn += actual;
     
         return actual;
       }
       return 0;
     }
     
     /// show verbose info on STDERR
     void verbose(const UserPtr& user)
     {
       if (!user.verbose)
         return;
       if (user.numBytesIn == 0)
         return;
     
       // elapsed and estimated time in seconds
       int duration  = int(time(NULL) - user.starttime);
       if (duration == 0)
         return;
       int estimated = int(duration * user.totalSize / user.numBytesIn);
     
       // display on STDERR
       fprintf(stderr, "\r%lld bytes => %lld bytes (%d%%", user.numBytesIn, user.numBytesOut, 100 * user.numBytesOut / user.numBytesIn);
       if (estimated > 0)
         fprintf(stderr, ", %d%% done", 100 * duration / estimated);
       fprintf(stderr, "), after %d seconds @ %d kByte/s", duration, duration > 0 ? (user.numBytesIn / duration) / 1024 : 0);
       if (estimated > 0)
         fprintf(stderr, ", about %d seconds left  ", estimated - duration);
     }
     
     /// write a block of bytes
     void sendBytesToOut(const void* data, size_t numBytes, void* userPtr)
     {
       /// cast user-specific data
       UserPtr* user = (UserPtr*)userPtr;
       if (data && numBytes > 0)
       {
         fwrite(data, 1, numBytes, user->out);
         user->numBytesOut += numBytes;
     
         if (user->verbose)
           verbose(*user);
       }
     }
     
     
     // ==================== COMMAND-LINE HANDLING ====================
     
     
     // show simple help
     static void showHelp(const char* program)
     {
       printf("smalLZ4 %s%s: compressor with optimal parsing, fully compatible with LZ4 by Yann Collet (see https://lz4.org)\n"
         "\n"
         "Basic usage:\n"
         "  %s [flags] [input] [output]\n"
         "\n"
         "This program writes to STDOUT if output isn't specified\n"
         "and reads from STDIN if input isn't specified, either.\n"
         "\n"
         "Examples:\n"
         "  %s   < abc.txt > abc.txt.lz4    # use STDIN and STDOUT\n"
         "  %s     abc.txt > abc.txt.lz4    # read from file and write to STDOUT\n"
         "  %s     abc.txt   abc.txt.lz4    # read from and write to file\n"
         "  cat abc.txt | %s - abc.txt.lz4  # read from STDIN and write to file\n"
         "  %s -6  abc.txt   abc.txt.lz4    # compression level 6 (instead of default 9)\n"
         "  %s -f  abc.txt   abc.txt.lz4    # overwrite an existing file\n"
         "  %s -f7 abc.txt   abc.txt.lz4    # compression level 7 and overwrite an existing file\n"
         "\n"
         "Flags:\n"
         "  -0, -1 ... -9   Set compression level, default: 9 (see below)\n"
         "  -h              Display this help message\n"
         "  -f              Overwrite an existing file\n"
         "  -l              Use LZ4 legacy file format\n"
         "  -D [FILE]       Load dictionary\n"
         "  -v              Verbose\n"
         "\n"
         "Compression levels:\n"
         " -0               No compression\n"
         " -1 ... -%d        Greedy search, check 1 to %d matches\n"
         " -%d ... -8        Lazy matching with optimal parsing, check %d to 8 matches\n"
         " -9               Optimal parsing, check all possible matches (default)\n"
         "\n"
         "Written in 2016-2020 by Stephan Brumme https://create.stephan-brumme.com/smallz4/\n"
         , smallz4::getVersion(), ""
         , program, program, program, program, program, program, program, program
         , smallz4::ShortChainsGreedy,     smallz4::ShortChainsGreedy
         , smallz4::ShortChainsGreedy + 1, smallz4::ShortChainsGreedy + 1);
     }
     
     
     /// parse command-line
     int main(int argc, const char* argv[])
     {
       // show help if no parameters and stdin isn't a pipe
       if (argc == 1 && _isatty(_fileno(stdin)) != 0)
       {
         showHelp(argv[0]);
         return 0;
       }
     
       unsigned short maxChainLength = 65535; // "unlimited" because search window contains only 2^16 bytes
     
       // overwrite output ?
       bool overwrite = false;
       // legacy format ? (not recommended, but smaller files if input < 8 MB)
       bool useLegacy = false;
       // preload dictionary from disk
       const char* dictionary = NULL;
     
       // default input/output streams
       UserPtr user;
       user.in  = stdin;
       user.out = stdout;
       user.verbose     = false;
       user.numBytesIn  = 0;
       user.numBytesOut = 0;
       user.totalSize   = 0;
     
       // parse flags
       int nextArgument = 1;
       bool skipArgument = false;
       while (argc > nextArgument && argv[nextArgument][0] == '-')
       {
         int argPos = 1;
         while (argv[nextArgument][argPos] != '\0')
         {
           switch (argv[nextArgument][argPos++])
           {
             // show help
           case 'h':
             showHelp(argv[0]);
             return 0;
     
             // force overwrite
           case 'f':
             overwrite = true;
             break;
     
             // old LZ4 format
           case 'l':
             useLegacy = true;
             break;
     
             // use dictionary
           case 'D':
             if (nextArgument + 1 >= argc)
               error("no dictionary filename found");
             dictionary = argv[nextArgument + 1]; // TODO: any flag immediately after -D causes an error
             skipArgument = true;
             break;
     
             // display some info on STDERR while compressing
           case 'v':
             user.verbose = true;
             break;
     
             // set compression level
           case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8':
             maxChainLength = argv[nextArgument][1] - '0'; // "0" => 0, "1" => 1, ..., "8" => 8
             break;
     
             // unlimited hash chain length
           case '9':
             // default maxChainLength is already "unlimited"
             break;
     
           default:
             error("unknown flag");
           }
         }
     
         nextArgument++;
         if (skipArgument)
           nextArgument++;
       }
     
       // input file is given as first parameter or stdin if no parameter is given (or "-")
       if (argc > nextArgument && argv[nextArgument][0] != '-')
       {
         user.in = fopen(argv[nextArgument], "rb");
         if (!user.in)
           error("file not found");
         nextArgument++;
       }
     
       // output file is given as second parameter or stdout if no parameter is given (or "-")
       if (argc == nextArgument + 1 && argv[nextArgument][0] != '-')
       {
         // check if file already exists
         if (!overwrite && fopen(argv[nextArgument], "rb"))
           error("output file already exists");
     
         user.out = fopen(argv[nextArgument], "wb");
         if (!user.out)
           error("cannot create file");
       }
     
       // basic check of legacy format's restrictions
       if (useLegacy)
       {
         if (dictionary != 0)
           error("legacy format doesn't support dictionaries");
         if (maxChainLength == 0)
           error("legacy format doesn't support uncompressed files");
       }
     
       // load dictionary
       std::vector<unsigned char> preload;
       if (dictionary != NULL)
       {
         // open dictionary
         FILE* dict = fopen(dictionary, "rb");
         if (!dict)
           error("cannot open dictionary");
     
         // get dictionary's filesize
         fseek(dict, 0, SEEK_END);
         size_t dictSize = ftell(dict);
         // only the last 64k are relevant
         const size_t Last64k = 65536;
         size_t relevant = dictSize < Last64k ? 0 : dictSize - Last64k;
         fseek(dict, (long)relevant, SEEK_SET);
         if (dictSize > Last64k)
           dictSize = Last64k;
     
         // read those bytes
         preload.resize(dictSize);
         fread(&preload[0], 1, dictSize, dict);
         fclose(dict);
       }
     
       if (user.verbose)
       {
         if (user.in != stdin)
         {
           fseek(user.in, 0, SEEK_END);
           user.totalSize = ftell(user.in);
           fseek(user.in, 0, SEEK_SET);
         }
     
         user.starttime = time(NULL);
       }
     
       // and go !
       smallz4::lz4(getBytesFromIn, sendBytesToOut, maxChainLength, preload, useLegacy, &user);
     
       if (user.verbose && user.numBytesIn > 0)
         fprintf(stderr, "\r%lld bytes => %lld bytes (%d%%) after %d seconds                                                                      \n",
                 user.numBytesIn, user.numBytesOut, 100 * user.numBytesOut / user.numBytesIn, int(time(NULL) - user.starttime));
     
       return 0;
     }

LZ4 decompressor (C99)

Features:

minimal memory consumption: can be configured to use just 64kb (plus a few variables on the stack)
a single file without any macros, #ifdefs and compiler-specific stuff - just plain C
flexible I/O interface
support for fixed dictionaries
typically about half as fast as reference implementation (lz4.org)
... but in many cases 20% faster than reference implementation when piping output to another program, e.g. ./smallz4cat myfile.lz4 | md5sum
lots of code comments: you can actually understand what's going on ☺

What's missing:

XXHash checksum verification - see here for a basic implementation
skippable frames aren't supported but basic legacy frames can be decoded

hide Decompressor


     // //////////////////////////////////////////////////////////
     // smallz4cat.c
     // Copyright (c) 2016-2019 Stephan Brumme. All rights reserved.
     // see https://create.stephan-brumme.com/smallz4/
     //
     // "MIT License":
     // Permission is hereby granted, free of charge, to any person obtaining a copy
     // of this software and associated documentation files (the "Software"),
     // to deal in the Software without restriction, including without limitation
     // the rights to use, copy, modify, merge, publish, distribute, sublicense,
     // and/or sell copies of the Software, and to permit persons to whom the Software
     // is furnished to do so, subject to the following conditions:
     //
     // The above copyright notice and this permission notice shall be included
     // in all copies or substantial portions of the Software.
     //
     // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
     // INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
     // PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
     // HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
     // OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
     // SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
     
     // This program is a shorter, more readable, albeit slower re-implementation of lz4cat ( https://github.com/Cyan4973/xxHash )
     
     // compile: gcc smallz4cat.c -O3 -o smallz4cat -Wall -pedantic -std=c99 -s
     // The static 8k binary was compiled using Clang and dietlibc (see https://www.fefe.de/dietlibc/ )
     
     // Limitations:
     // - skippable frames and legacy frames are not implemented (and most likely never will)
     // - checksums are not verified (see https://create.stephan-brumme.com/xxhash/ for a simple implementation)
     
     // Replace getByteFromIn() and sendToOut() by your own code if you need in-memory LZ4 decompression.
     // Corrupted data causes a call to unlz4error().
     
     // suppress warnings when compiled by Visual C++
     #define _CRT_SECURE_NO_WARNINGS
     
     #include <stdio.h>  // stdin/stdout/stderr, fopen, ...
     #include <stdlib.h> // exit()
     #include <string.h> // memcpy
     
     #ifndef FALSE
     #define FALSE 0
     #define TRUE  1
     #endif
     
     /// error handler
     static void unlz4error(const char* msg)
     {
       // smaller static binary than fprintf(stderr, "ERROR: %s\n", msg);
       fputs("ERROR: ", stderr);
       fputs(msg,       stderr);
       fputc('\n',      stderr);
       exit(1);
     }
     
     
     // ==================== I/O INTERFACE ====================
     
     
     // read one byte from input, see getByteFromIn()  for a basic implementation
     typedef unsigned char (*GET_BYTE)  (void* userPtr);
     // write several bytes,      see sendBytesToOut() for a basic implementation
     typedef void          (*SEND_BYTES)(const unsigned char*, unsigned int, void* userPtr);
     
     struct UserPtr
     {
       // file handles
       FILE* in;
       FILE* out;
       // modify input buffer size as you like ... for most use cases, bigger buffer aren't faster anymore - and even reducing to 1 byte works !
     #define READ_BUFFER_SIZE 4*1024
       unsigned char readBuffer[READ_BUFFER_SIZE];
       unsigned int  pos;
       unsigned int  available;
     };
     
     /// read a single byte (with simple buffering)
     static unsigned char getByteFromIn(void* userPtr) // parameter "userPtr" not needed
     {
       /// cast user-specific data
       struct UserPtr* user = (struct UserPtr*)userPtr;
     
       // refill buffer
       if (user->pos == user->available)
       {
         user->pos = 0;
         user->available = fread(user->readBuffer, 1, READ_BUFFER_SIZE, user->in);
         if (user->available == 0)
           unlz4error("out of data");
       }
     
       // return a byte
       return user->readBuffer[user->pos++];
     }
     
     /// write a block of bytes
     static void sendBytesToOut(const unsigned char* data, unsigned int numBytes, void* userPtr)
     {
       /// cast user-specific data
       struct UserPtr* user = (struct UserPtr*)userPtr;
       if (data != NULL && numBytes > 0)
         fwrite(data, 1, numBytes, user->out);
     }
     
     
     // ==================== LZ4 DECOMPRESSOR ====================
     
     
     /// decompress everything in input stream (accessed via getByte) and write to output stream (via sendBytes)
     void unlz4_userPtr(GET_BYTE getByte, SEND_BYTES sendBytes, const char* dictionary, void* userPtr)
     {
       // signature
       unsigned char signature1 = getByte(userPtr);
       unsigned char signature2 = getByte(userPtr);
       unsigned char signature3 = getByte(userPtr);
       unsigned char signature4 = getByte(userPtr);
       unsigned int  signature  = (signature4 << 24) | (signature3 << 16) | (signature2 << 8) | signature1;
       unsigned char isModern   = (signature == 0x184D2204);
       unsigned char isLegacy   = (signature == 0x184C2102);
       if (!isModern && !isLegacy)
         unlz4error("invalid signature");
     
       unsigned char hasBlockChecksum   = FALSE;
       unsigned char hasContentSize     = FALSE;
       unsigned char hasContentChecksum = FALSE;
       unsigned char hasDictionaryID    = FALSE;
       if (isModern)
       {
         // flags
         unsigned char flags = getByte(userPtr);
         hasBlockChecksum   = flags & 16;
         hasContentSize     = flags &  8;
         hasContentChecksum = flags &  4;
         hasDictionaryID    = flags &  1;
     
         // only version 1 file format
         unsigned char version = flags >> 6;
         if (version != 1)
           unlz4error("only LZ4 file format version 1 supported");
     
         // ignore blocksize
         char numIgnore = 1;
     
         // ignore, skip 8 bytes
         if (hasContentSize)
           numIgnore += 8;
         // ignore, skip 4 bytes
         if (hasDictionaryID)
           numIgnore += 4;
     
         // ignore header checksum (xxhash32 of everything up this point & 0xFF)
         numIgnore++;
     
         // skip all those ignored bytes
         while (numIgnore--)
           getByte(userPtr);
       }
     
       // don't lower this value, backreferences can be 64kb far away
     #define HISTORY_SIZE 64*1024
       // contains the latest decoded data
       unsigned char history[HISTORY_SIZE];
       // next free position in history[]
       unsigned int  pos = 0;
     
       // dictionary compression is a recently introduced feature, just move its contents to the buffer
       if (dictionary != NULL)
       {
         // open dictionary
         FILE* dict = fopen(dictionary, "rb");
         if (!dict)
           unlz4error("cannot open dictionary");
     
         // get dictionary's filesize
         fseek(dict, 0, SEEK_END);
         long dictSize = ftell(dict);
         // only the last 64k are relevant
         long relevant = dictSize < 65536 ? 0 : dictSize - 65536;
         fseek(dict, relevant, SEEK_SET);
         if (dictSize > 65536)
           dictSize = 65536;
         // read it and store it at the end of the buffer
         fread(history + HISTORY_SIZE - dictSize, 1, dictSize, dict);
         fclose(dict);
       }
     
       // parse all blocks until blockSize == 0
       while (1)
       {
         // block size
         unsigned int blockSize = getByte(userPtr);
         blockSize |= (unsigned int)getByte(userPtr) <<  8;
         blockSize |= (unsigned int)getByte(userPtr) << 16;
         blockSize |= (unsigned int)getByte(userPtr) << 24;
     
         // highest bit set ?
         unsigned char isCompressed = isLegacy || (blockSize & 0x80000000) == 0;
         if (isModern)
           blockSize &= 0x7FFFFFFF;
     
         // stop after last block
         if (blockSize == 0)
           break;
     
         if (isCompressed)
         {
           // decompress block
           unsigned int blockOffset = 0;
           unsigned int numWritten  = 0;
           while (blockOffset < blockSize)
           {
             // get a token
             unsigned char token = getByte(userPtr);
             blockOffset++;
     
             // determine number of literals
             unsigned int numLiterals = token >> 4;
             if (numLiterals == 15)
             {
               // number of literals length encoded in more than 1 byte
               unsigned char current;
               do
               {
                 current = getByte(userPtr);
                 numLiterals += current;
                 blockOffset++;
               } while (current == 255);
             }
     
             blockOffset += numLiterals;
     
             // copy all those literals
             if (pos + numLiterals < HISTORY_SIZE)
             {
               // fast loop
               while (numLiterals-- > 0)
                 history[pos++] = getByte(userPtr);
             }
             else
             {
               // slow loop
               while (numLiterals-- > 0)
               {
                 history[pos++] = getByte(userPtr);
     
                 // flush output buffer
                 if (pos == HISTORY_SIZE)
                 {
                   sendBytes(history, HISTORY_SIZE, userPtr);
                   numWritten += HISTORY_SIZE;
                   pos = 0;
                 }
               }
             }
     
             // last token has only literals
             if (blockOffset == blockSize)
               break;
     
             // match distance is encoded in two bytes (little endian)
             unsigned int delta = getByte(userPtr);
             delta |= (unsigned int)getByte(userPtr) << 8;
             // zero isn't allowed
             if (delta == 0)
               unlz4error("invalid offset");
             blockOffset += 2;
     
             // match length (always >= 4, therefore length is stored minus 4)
             unsigned int matchLength = 4 + (token & 0x0F);
             if (matchLength == 4 + 0x0F)
             {
               unsigned char current;
               do // match length encoded in more than 1 byte
               {
                 current = getByte(userPtr);
                 matchLength += current;
                 blockOffset++;
               } while (current == 255);
             }
     
             // copy match
             unsigned int referencePos = (pos >= delta) ? (pos - delta) : (HISTORY_SIZE + pos - delta);
             // start and end within the current 64k block ?
             if (pos + matchLength < HISTORY_SIZE && referencePos + matchLength < HISTORY_SIZE)
             {
               // read/write continuous block (no wrap-around at the end of history[])
               // fast copy
               if (pos >= referencePos + matchLength || referencePos >= pos + matchLength)
               {
                 // non-overlapping
                 memcpy(history + pos, history + referencePos, matchLength);
                 pos += matchLength;
               }
               else
               {
                 // overlapping, slower byte-wise copy
                 while (matchLength-- > 0)
                   history[pos++] = history[referencePos++];
               }
             }
             else
             {
               // either read or write wraps around at the end of history[]
               while (matchLength-- > 0)
               {
                 // copy single byte
                 history[pos++] = history[referencePos++];
     
                 // cannot write anymore ? => wrap around
                 if (pos == HISTORY_SIZE)
                 {
                   // flush output buffer
                   sendBytes(history, HISTORY_SIZE, userPtr);
                   numWritten += HISTORY_SIZE;
                   pos = 0;
                 }
                 // wrap-around of read location
                 referencePos %= HISTORY_SIZE;
               }
             }
           }
     
           // all legacy blocks must be completely filled - except for the last one
           if (isLegacy && numWritten + pos < 8*1024*1024)
             break;
         }
         else
         {
           // copy uncompressed data and add to history, too (if next block is compressed and some matches refer to this block)
           while (blockSize-- > 0)
           {
             // copy a byte ...
             history[pos++] = getByte(userPtr);
             // ... until buffer is full => send to output
             if (pos == HISTORY_SIZE)
             {
               sendBytes(history, HISTORY_SIZE, userPtr);
               pos = 0;
             }
           }
         }
     
         if (hasBlockChecksum)
         {
           // ignore checksum, skip 4 bytes
           getByte(userPtr); getByte(userPtr); getByte(userPtr); getByte(userPtr);
         }
       }
     
       if (hasContentChecksum)
       {
         // ignore checksum, skip 4 bytes
         getByte(userPtr); getByte(userPtr); getByte(userPtr); getByte(userPtr);
       }
     
       // flush output buffer
       sendBytes(history, pos, userPtr);
     }
     
     /// old interface where getByte and sendBytes use global file handles
     void unlz4(GET_BYTE getByte, SEND_BYTES sendBytes, const char* dictionary)
     {
       unlz4_userPtr(getByte, sendBytes, dictionary, NULL);
     }
     
     
     // ==================== COMMAND-LINE HANDLING ====================
     
     
     /// parse command-line
     int main(int argc, const char* argv[])
     {
       // default input/output streams
       struct UserPtr user =
       {
         .in        = stdin,
         .out       = stdout,
         .pos       = 0, // initial input buffer is empty
         .available = 0
       };
     
       const char* dictionary = NULL;
     
       // first command-line parameter is our input filename / but ignore "-" which stands for STDIN
       int parameter;
       for (parameter = 1; parameter < argc; parameter++)
       {
         const char* current = argv[parameter];
         // dictionary
         if (current[0] == '-' && current[1] == 'D')
         {
           if (parameter + 1 >= argc)
             unlz4error("no dictionary filename found");
           dictionary = argv[++parameter];
           continue;
         }
     
         // filename
         // read from STDIN, default behavior
         if (current[0] != '-' && current[1] != '\0')
         {
           // already have a filename - at most one filename is allowed (except for dictionary) ?
           if (user.in != stdin)
             unlz4error("can only decompress one file at a time");
           // get handle
           user.in = fopen(argv[1], "rb");
           if (!user.in)
             unlz4error("file not found");
         }
       }
     
       // and go !
       unlz4_userPtr(getByteFromIn, sendBytesToOut, dictionary, &user);
       return 0;
     }

License

This code is licensed under the MIT License:

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Source Code

Git users: scroll down to the repository link

Download smallz4cat.c

Latest release: December 18, 2019, size: 12.7 kBytes, 417 lines

CRC32: a2c2c3bb
MD5: f234acd0df3b9adde3b598495e25b73c
SHA1: 0d56c9d388a748054ba3cf9fee861be7c2fec057
SHA256:464a0c0f4b9962df73d863fab080c1a6f354444ef9f40ba2edeebe1d852cf44e

Download smallz4.h

Latest release: April 13, 2020, size: 31.0 kBytes, 815 lines

CRC32: 3bd397d3
MD5: 91a2eac62e522b20d44e02e085665edf
SHA1: 2c7ad990da6853ceda975fb64632a7c6bfba586a
SHA256:36edbb641328b05e28e599f57b662b7cee1808be784a79393dafd435fd8e7daf

Download smallz4.cpp

Latest release: April 13, 2020, size: 9.8 kBytes, 326 lines

CRC32: 04409f4b
MD5: 34e2b605aed3c7c3ecfe80cce5761f54
SHA1: a6bfd064cbd9d62d65a4f26ea3d0a6460880ecb2
SHA256:060563d0074c369cd421cf20714f97d1e26f62c4d23ead21c24c709fdf5f27a9

Download Makefile

Latest release: December 18, 2019, size: 1729 bytes, 47 lines

CRC32: 5ab0e836
MD5: 38734bb058a0aa452f6ecf0dccbfdf94
SHA1: 14508594855bdf0b5ca55444177784b9493c6aaf
SHA256:7f0f9b9f0ec89deb38c4767d3993f1ec6af321cf87c71a8727a3d21fb0574a46

Stay up-to-date:git clone https://create.stephan-brumme.com/smallz4/git

GitHub mirror:https://github.com/stbrumme/smallz4

If you encounter any bugs/problems or have ideas for improving future versions, please write me an email: create@stephan-brumme.com

Windows executables (x64 only)

(compiled with Visual Studio 2019 Community, no dependencies except kernel32.dll)

Download smallz4-v1.5.exe

Latest release: April 13, 2020, size: 158.0 kBytes

CRC32: 49438780
MD5: e13ef144ea003bfd71c28a567870ddee
SHA1: 592f413e21082d032f3e98b60dc885d97d232513
SHA256:9d527a1e229b00e52c7219c04ded8846cd8ddd2de4064f954c1d706aacd870c1

previous versions:
smallz4-v1.5.exe (April 13, 2020)
smallz4-v1.3.1.exe (April 25, 2019)
smallz4-v1.3.exe (November 14, 2018)
smallz4-v1.2.exe (August 3, 2018)
smallz4-v1.1.exe (August 3, 2018)
smallz4-v1.0.exe (November 2, 2016)
smallz4-v0.6.exe (September 9, 2016)
smallz4-v0.5.exe (September 1, 2016)
smallz4-v0.4.exe (August 17, 2016)

Download smallz4cat-v1.3.1.exe

Latest release: April 25, 2019, size: 110.5 kBytes

CRC32: b87861be
MD5: 3328e6142f9d42d45798f7e49a42c9b9
SHA1: e17f39810a9aece82c341645a472368fffb22868
SHA256:6bd522b5824e3933af9afa530cbcbb7dac2ac4d2a99284420a5e5ba6ed795737

previous versions:
smallz4cat-v1.3.1.exe (April 25, 2019)
smallz4cat-v1.3.exe (November 14, 2018)
smallz4cat-v1.2.1.exe (August 21, 2018)
smallz4cat-v1.2.exe (August 3, 2018)
smallz4cat-v1.1.exe (August 3, 2018)
smallz4cat-v1.0.exe (October 25, 2016)
smallz4cat-v0.4.exe (August 17, 2016)

Linux executables (x32 and x64, portable / statically linked)

smallz4 was build with g++ -O2 -s -Wall -pedantic -static (non-static builds of smallz4 are much smaller, only about 20k ... but not portable anymore)
smallz4cat was compiled with CLang / dietc / strip

Download smallz4-v1.5b

Latest release: February 18, 2021, size: 182.0 kBytes

CRC32: 30e35b25
MD5: a20cb6fcf873d2590491f04fa0a22bb6
SHA1: 64661cb132bdde8e1144c010cab772c493609be1
SHA256:786a0c87e4f6316c88d8c304569f3af9af42ecccdf76803f968a3b3f249a8eb8

previous versions:
smallz4-v1.5 (April 13, 2020)
smallz4-v1.4 (December 18, 2019)
smallz4-v1.3 (November 14, 2018)
smallz4-v1.2 (August 3, 2018)
smallz4-v1.1 (August 2, 2018)
smallz4-v1.0 (October 25, 2016)
smallz4-v0.6 (September 9, 2016)

Download smallz4-x32-v1.5

Latest release: April 13, 2020, size: 655.5 kBytes

CRC32: eaf23629
MD5: 2d3b5a115a151743f9962090a3c6e612
SHA1: b296df65a69fd3df98e588858657f2da144eb559
SHA256:ac0c8bbf9ccb4f11728cc0ec4bf67d96f38bf18cf09bdcf07b2a23bc812d5c3b

previous versions:
smallz4-x32-v1.5 (April 13, 2020)
smallz4-x32-v1.4 (December 18, 2019)
smallz4-x32-v1.3 (November 14, 2018)
smallz4-x32-v1.2 (August 3, 2018)
smallz4-x32-v1.1 (August 2, 2018)
smallz4-x32-v1.0 (October 25, 2016)
smallz4-x32-v0.6 (September 9, 2016)

Download smallz4cat-v1.4

Latest release: December 18, 2019, size: 7.3 kBytes

CRC32: c77c307b
MD5: 0828484b7d9cb2d1bc31d5a88f558254
SHA1: a3f2c4409a28947cea0b35ac87eb67a08d1e41f5
SHA256:e9e5ea9fda87edaf6268ffca00119e85aa9af77ace6cccd8228af053c1b1cdbc

Download smallz4cat-x32-v1.4

Latest release: December 18, 2019, size: 7.0 kBytes

CRC32: b1cd337c
MD5: a3389764049ab3273fdee0f5e0541d82
SHA1: fa5ab54e7cb47d18d9d931f45a4826fdc0b80b81
SHA256:13d293ddeedb469f51da41167f79b2cbdb904e681716f6e6191b233dbb162438

Changelog

version 1.5
- April 13, 2020
- bugfix for files over 2GB
- Git tag smallz4_v1.5
version 1.4
- latest and greatest
- December 18, 2019
- user-defined pointer can be supplied for read and write functions
- Git tag smallz4_v1.4
version 1.3.1
- April 25, 2019
- slightly faster and more code comments
- Git tag smallz4_v1.3.1
version 1.3
- November 14, 2018
- support LZ4 legacy format, improved cost estimation
- Git tag smallz4_v1.3
version 1.2.1
- August 21, 2018
- optimized building to reduce binaries' size
version 1.2
- August 3, 2018
- experimental support for dictionary compression
- Git tag smallz4_v1.2
version 1.1
- August 2, 2018
- bugfix: uncompressed blocks has wrong flag
- improved compression edge on block boundaries
- Git tag smallz4_v1.1
version 1.0
- October 25, 2016
- MIT license, added Makefile
- code remained unchanged
- Git tag smallz4_v1.0
version 0.6
- September 9, 2016
- refactored compression part to be a header-only library
- Git tag smallz4_v0.6
version 0.5
- September 1, 2016
- faster match finder, about 20% speed-up (thanks to Stefan Atev for inspiration)
- Git tag smallz4_v0.5
version 0.4
- August 17, 2016
- fixed bug in cost estimation
- added lazy matching
- Git tag smallz4_v0.4
version 0.3
- August 12, 2016
- initial version
- Git tag smallz4_v0.3


June 30, 2021: Length-Limited Prefix Codes April 13, 2020: smalLZ4 - optimal LZ4 compression June 17, 2019: toojpeg - a JPEG encoder in a single C++ file ... and 30 more !	RSS	www.stephan-brumme.com bits.stephan-brumme.com photos.stephan-brumme.com minime.stephan-brumme.com
© 2011-2024 Stephan Brumme products and company names mentioned may be trademarks of their respective owners. see my disclaimer. dude, this is the end ... of this page.