TIFF 6.0 Specification
Final—June 3, 1992
• LZW is lossless. All information is preserved. But if noise or information is
removed from an image, perhaps by smoothing or zeroing some low-order
bitplanes, LZW compresses images to a smaller size. Thus, 5-bit, 6-bit, or 7-bit
data masquerading as 8-bit data compresses better than true 8-bit data. Smooth
images also compress better than noisy images, and simple images compress
better than complex images.
• LZW works quite well on bilevel images, too. On our test images, it almost
always beat PackBits and generally tied CCITT 1D (Modified Huffman) com-
pression. LZW also handles halftoned data better than most bilevel compres-
sion schemes.
The Algorithm
Each strip is compressed independently. We strongly recommend that
RowsPerStrip be chosen such that each strip contains about 8K bytes before com-
pression. We want to keep the strips small enough so that the compressed and
uncompressed versions of the strip can be kept entirely in memory, even on small
machines, but are large enough to maintain nearly optimal compression ratios.
The LZW algorithm is based on a translation table, or string table, that maps
strings of input characters into codes. The TIFF implementation uses variable-
length codes, with a maximum code length of 12 bits. This string table is different
for every strip and does not need to be reatained for the decompressor. The trick is
to make the decompressor automatically build the same table as is built when the
data is compressed. We use a C-like pseudocode to describe the coding scheme:
InitializeStringTable();
WriteCode(ClearCode);
Ω
= the empty string;
for each character in the strip {
K = GetNextCharacter();
if
Ω+K
is in the string table {
Ω
=
Ω+K;
/* string concatenation */
} else {
WriteCode (CodeFromString(Ω));
AddTableEntry(Ω+K);
Ω
= K;
}
} /* end of for loop */
WriteCode (CodeFromString(Ω));
WriteCode (EndOfInformation);
That’s it. The scheme is simple, although it is challenging to implement effi-
ciently. But we need a few explanations before we go on to decompression.
The “characters” that make up the LZW strings are bytes containing TIFF
uncompressed (Compression=1) image data, in our implementation. For example,
if BitsPerSample is 4, each 8-bit LZW character will contain two 4-bit pixels. If
BitsPerSample is 16, each 16-bit pixel will span two 8-bit LZW characters.
It is also possible to implement a version of LZW in which the LZW character
depth equals BitsPerSample, as described in Draft 2 of Revision 5.0. But there is a
major problem with this approach. If BitsPerSample is greater than 11, we can not
58