CHAPTER 3
72
Syntax
Details of LZW Encoding
Data encoded using the LZW compression method consists of a sequence of
codes that are 9 to 12 bits long. Each code represents a single character of input
data (0–255), a clear-table marker (256), an EOD marker (257), or a table entry
representing a multiple-character sequence that has been encountered previously
in the input (258 or greater).
Initially, the code length is 9 bits and the LZW table contains only entries for the
258 fixed codes. As encoding proceeds, entries are appended to the table, asso-
ciating new codes with longer and longer sequences of input characters. The
encoder and the decoder maintain identical copies of this table.
Whenever both the encoder and the decoder independently (but synchronously)
realize that the current code length is no longer sufficient to represent the
number of entries in the table, they increase the number of bits per code by 1. The
first output code that is 10 bits long is the one following the creation of table entry
511, and similarly for 11 (1023) and 12 (2047) bits. Codes are never longer than
12 bits; therefore, entry 4095 is the last entry of the LZW table.
The encoder executes the following sequence of steps to generate each output
code:
1. Accumulate a sequence of one or more input characters matching a sequence
already present in the table. For maximum compression, the encoder looks for
the longest such sequence.
2. Emit the code corresponding to that sequence.
3. Create a new table entry for the first unused code. Its value is the sequence
found in step 1 followed by the next input character.
For example, suppose the input consists of the following sequence of ASCII
character codes:
45 45 45 45 45 65 45 45 45 66
Starting with an empty table, the encoder proceeds as shown in Table 3.6.