Previous Next
TIFF 6.0 Specification Final—June 3, 1992
you can write it as a 9-bit code after the ClearCode . Decompression gives the
same result in either case.
To make things a little simpler for the decompressor, we will require that each
strip begins with a ClearCode and ends with an EndOfInformation code. Every
LZW-compressed strip must begin on a byte boundary. It need not begin on a
word boundary. LZW compression codes are stored into bytes in high-to-low-
order fashion, i.e., FillOrder is assumed to be 1. The compressed codes are written
as bytes (not words) so that the compressed data will be identical whether it is an
‘II’ or ‘MM’ file.
Note that the LZW string table is a continuously updated history of the strings that
have been encountered in the data. Thus, it reflects the characteristics of the data,
providing a high degree of adaptability.
LZW Decoding
The procedure for decompression is a little more complicated:
while ((Code = GetNextCode()) != EoiCode) {
if (Code == ClearCode) {
InitializeTable();
Code = GetNextCode();
if (Code == EoiCode)
break;
WriteString(StringFromCode(Code));
OldCode = Code;
} /* end of ClearCode case */
else {
if (IsInTable(Code)) {
WriteString(StringFromCode(Code));
AddStringToTable(StringFromCode(OldCode
)+FirstChar(StringFromCode(Code)));
OldCode = Code;
} else {
OutString = StringFromCode(OldCode) +
FirstChar(StringFromCode(OldCode));
WriteString(OutString);
AddStringToTable(OutString);
OldCode = Code;
}
} /* end of not-ClearCode case */
} /* end of while loop */
The function GetNextCode() retrieves the next code from the LZW-coded data. It
must keep track of bit boundaries. It knows that the first code that it gets will be a
9-bit code. We add a table entry each time we get a code. So, GetNextCode() must
switch over to 10-bit codes as soon as string #510 is stored into the table. Simi-
larly, the switch is made to 11-bit codes after #1022 and to 12-bit codes after
#2046.
61
TIFF 6.0 Specification Final—June 3, 1992
The function StringFromCode() gets the string associated with a particular code
from the string table.
The function AddStringToTable() adds a string to the string table. The “+” sign
joining the two parts of the argument to AddStringToTable indicates string con-
catenation.
StringFromCode() looks up the string associated with a given code.
WriteString() adds a string to the output stream.
When SamplesPerPixel Is Greater Than 1
So far, we have described the compression scheme as if SamplesPerPixel were
always 1, as is the case with palette-color and grayscale images. But what do we
do with RGB image data?
Tests on our sample images indicate that the LZW compression ratio is nearly
identical whether PlanarConfiguration=1 or PlanarConfiguration=2, for RGB
images. So, use whichever configuration you prefer and simply compress the
bytes in the strip.
Note: Compression ratios on our test RGB images were disappointingly low:
between 1.1 to 1 and 1.5 to 1, depending on the image. Vendors are urged to do
what they can to remove as much noise as possible from their images. Preliminary
tests indicate that significantly better compression ratios are possible with less-
noisy images. Even something as simple as zeroing-out one or two least-signifi-
cant bitplanes can be effective, producing little or no perceptible image
degradation.
Implementation
The exact structure of the string table and the method used to determine if a string
is already in the table are probably the most significant design decisions in the
implementation of a LZW compressor and decompressor. Hashing has been sug-
gested as a useful technique for the compressor. We have chosen a tree-based
approach, with good results. The decompressor is more straightforward and faster
because no search is involved—strings can be accessed directly by code value.
LZW Extensions
Some images compress better using LZW coding if they are first subjected to a
process wherein each pixel value is replaced by the difference between the pixel
and the preceding pixel. See the following Section.
62
TIFF 6.0 Specification Final—June 3, 1992
Acknowledgments
See the first page of this section for the LZW reference.
The use of ClearCode as a technique for handling overflow was borrowed from
the compression scheme used by the Graphics Interchange Format (GIF), a small-
color-paint-image-file format used by CompuServe that also uses an adaptation of
the LZW technique.
63
TIFF 6.0 Specification Final—June 3, 1992
Section 14: Differencing Predictor
This section defines a Predictor that greatly improves compression ratios for some
images.
Predictor
Tag = 317 (13D.H)
Type = SHORT
N =1
A predictor is a mathematical operator that is applied to the image data before an
encoding scheme is applied. Currently this field is used only with LZW (Com-
pression=5) encoding because LZW is probably the only TIFF encoding scheme
that benefits significantly from a predictor step. See Section 13.
The possible values are:
1= No prediction scheme used before coding.
2 = Horizontal differencing.
Default is 1.
The algorithm
Make use of the fact that many continuous-tone images rarely vary much in pixel
value from one pixel to the next. In such images, if we replace the pixel values by
differences between consecutive pixels, many of the differences should be 0, plus
or minus 1, and so on. This reduces the apparent information content and allows
LZW to encode the data more compactly.
Assuming 8-bit grayscale pixels for the moment, a basic C implementation might
look something like this:
char image[ ][ ];
int row, col;
/* take horizontal differences:
*/
for (row = 0; row < nrows; row++)
for (col = ncols - 1; col >= 1; col--)
image[row][col] -= image[row][col-1];
If we don’t have 8-bit components, we need to work a little harder to make better
use of the architecture of most CPUs. Suppose we have 4-bit components packed
two per byte in the normal TIFF uncompressed (i.e., Compression=1) fashion. To
find differences, we want to first expand each 4-bit component into an 8-bit byte,
so that we have one component per byte, low-order justified. We then perform the
horizontal differencing illustrated in the example above. Once the differencing
has been completed, we then repack the 4-bit differences two to a byte, in the
normal TIFF uncompressed fashion.
64
TIFF 6.0 Specification Final—June 3, 1992
If the components are greater than 8 bits deep, expanding the components into 16-
bit words instead of 8-bit bytes seems like the best way to perform the subtraction
on most computers.
Note that we have not lost any data up to this point, nor will we lose any data later
on. It might seem at first that our differencing might turn 8-bit components into 9-
bit differences, 4-bit components into 5-bit differences, and so on. But it turns out
that we can completely ignore the “overflow” bits caused by subtracting a larger
number from a smaller number and still reverse the process without error. Normal
two’s complement arithmetic does just what we want. Try an example by hand if
you need more convincing.
Up to this point we have implicitly assumed that we are compressing bilevel or
grayscale images. An additional consideration arises in the case of color images.
If PlanarConfiguration is 2, there is no problem. Differencing works the same as it
does for grayscale data.
If PlanarConfiguration is 1, however, things get a little trickier. If we didn’t do
anything special, we would subtract red component values from green component
values, green component values from blue component values, and blue compo-
nent values from red component values. This would not give the LZW coding
stage much redundancy to work with. So, we will do our horizontal differences
with an offset of SamplesPerPixel (3, in the RGB case). In other words, we will
subtract red from red, green from green, and blue from blue. The LZW coding
stage is identical to the SamplesPerPixel=1 case. We require that BitsPerSample
be the same for all 3 components.
Results and Guidelines
LZW without differencing works well for 1-bit images, 4-bit grayscale images,
and many palette-color images. But natural 24-bit color images and some 8-bit
grayscale images do much better with differencing.
Although the combination of LZW coding with horizontal differencing does not
result in any loss of data, it may be worthwhile in some situations to give up some
information by removing as much noise as possible from the image data before
doing the differencing, especially with 8-bit components. The simplest way to get
rid of noise is to mask off one or two low-order bits of each 8-bit component. On
our 24-bit test images, LZW with horizontal differencing yielded an average
compression ratio of 1.4 to 1. When the low-order bit was masked from each
component, the compression ratio climbed to 1.8 to 1; the compression ratio was
2.4 to 1 when masking two bits, and 3.4 to 1 when masking three bits. Of course,
the more you mask, the more you risk losing useful information along with the
noise. We encourage you to experiment to find the best compromise for your
device. For some applications, it may be useful to let the user make the final deci-
sion.
Incidentally, we tried taking both horizontal and vertical differences, but the extra
complexity of two-dimensional differencing did not appear to pay off for most of
our test images. About one third of the images compressed slightly better with
two-dimensional differencing, about one third compressed slightly worse, and the
rest were about the same.
65
TIFF 6.0 Specification Final—June 3, 1992
Section 15: Tiled Images
Introduction
Motivation
This section describes how to organize images into tiles instead of strips.
For low-resolution to medium-resolution images, the standard TIFF method of
breaking the image into strips is adequate. However high-resolution images can
be accessed more efficiently—and compression tends to work better—if the im-
age is broken into roughly square tiles instead of horizontally-wide but vertically-
narrow strips.
Relationship to existing fields
When the tiling fields described below are used, they replace the
StripOffsets, StripByteCounts, and RowsPerStrip fields. Use of tiles will
therefore cause older TIFF readers to give up because they will have no way of
knowing where the image data is or how it is organized. Do not use both strip-
oriented and tile-oriented fields in the same TIFF file.
Padding
Tile size is defined by TileWidth and TileLength. All tiles in an image are the
same size; that is, they have the same pixel dimensions.
Boundary tiles are padded to the tile boundaries. For example, if TileWidth is 64
and ImageWidth is 129, then the image is 3 tiles wide and 63 pixels of padding
must be added to fill the rightmost column of tiles. The same holds for TileLength
and ImageLength. It doesn’t matter what value is used for padding, because good
TIFF readers display only the pixels defined by ImageWidth and ImageLength
and ignore any padded pixels. Some compression schemes work best if the pad-
ding is accomplished by replicating the last column and last row instead of pad-
ding with 0’s.
The price for padding the image out to tile boundaries is that some space is
wasted. But compression usually shrinks the padded areas to almost nothing.
Even if data is not compressed, remember that tiling is intended for large images.
Large images have lots of comparatively small tiles, so that the percentage of
wasted space will be very small, generally on the order of a few percent or less.
The advantages of padding an image to the tile boundaries are that implementa-
tions can be simpler and faster and that it is more compatible with tile-oriented
compression schemes such as JPEG. See Section 22.
Tiles are compressed individually, just as strips are compressed. That is, each row
of data in a tile is treated as a separate “scanline” when compressing. Compres-
66
TIFF 6.0 Specification Final—June 3, 1992
sion includes any padded areas of the rightmost and bottom tiles so that all the
tiles in an image are the same size when uncompressed.
All of the following fields are required for tiled images:
Fields
TileWidth
Tag = 322 (142.H)
Type = SHORT or LONG
N =1
The tile width in pixels. This is the number of columns in each tile.
Assuming integer arithmetic, three computed values that are useful in the follow-
ing field descriptions are:
TilesAcross = (ImageWidth + TileWidth - 1) / TileWidth
TilesDown = (ImageLength + TileLength - 1) / TileLength
TilesPerImage = TilesAcross * TilesDown
These computed values are not TIFF fields; they are simply values determined by
the ImageWidth, TileWidth, ImageLength, and TileLength fields.
TileWidth and ImageWidth together determine the number of tiles that span the
width of the image (TilesAcross). TileLength and ImageLength together deter-
mine the number of tiles that span the length of the image (TilesDown).
We recommend choosing TileWidth and TileLength such that the resulting tiles
are about 4K to 32K bytes before compression. This seems to be a reasonable
value for most applications and compression schemes.
TileWidth must be a multiple of 16. This restriction improves performance in
some graphics environments and enhances compatibility with compression
schemes such as JPEG.
Tiles need not be square.
Note that ImageWidth can be less than TileWidth, although this means that the
tiles are too large or that you are using tiling on really small images, neither of
which is recommended. The same observation holds for ImageLength and
TileLength.
No default. See also TileLength, TileOffsets, TileByteCounts.
TileLength
Tag = 323 (143.H)
Type = SHORT or LONG
N =1
67
TIFF 6.0 Specification Final—June 3, 1992
The tile length (height) in pixels. This is the number of rows in each tile.
TileLength must be a multiple of 16 for compatibility with compression schemes
such as JPEG.
Replaces RowsPerStrip in tiled TIFF files.
No default. See also TileWidth, TileOffsets, TileByteCounts.
TileOffsets
Tag = 324 (144.H)
Type = LONG
N = TilesPerImage for PlanarConfiguration = 1
= SamplesPerPixel * TilesPerImage for PlanarConfiguration = 2
For each tile, the byte offset of that tile, as compressed and stored on disk. The
offset is specified with respect to the beginning of the TIFF file. Note that this
implies that each tile has a location independent of the locations of other tiles.
Offsets are ordered left-to-right and top-to-bottom. For PlanarConfiguration = 2,
the offsets for the first component plane are stored first, followed by all the offsets
for the second component plane, and so on.
No default. See also TileWidth, TileLength, TileByteCounts.
TileByteCounts
Tag = 325 (145.H)
Type = SHORT or LONG
N = TilesPerImage for PlanarConfiguration = 1
= SamplesPerPixel * TilesPerImage for PlanarConfiguration = 2
For each tile, the number of (compressed) bytes in that tile.
See TileOffsets for a description of how the byte counts are ordered.
No default. See also TileWidth, TileLength, TileOffsets.
68
TIFF 6.0 Specification Final—June 3, 1992
Section 16: CMYK Images
Motivation
This section describes how to store separated (usually CMYK) image data in a
TIFF file.
In a separated image, each pixel consists of N components. Each component
represents the amount of a particular ink that is to be used to represent the image at
that location, typically using a halftoning technique.
For example, in a CMYK image, each pixel consists of 4 components. Each com-
ponent represents the amount of cyan, magenta, yellow, or black process ink that
is to be used to represent the image at that location.
The fields described in this section can be used for more than simple 4-color pro-
cess (CMYK) printing. They can also be used for describing an image made up of
more than 4 inks, such an image made up of a cyan, magenta, yellow, red, green,
blue, and black inks. Such an image is sometimes called a high-fidelity image and
has the advantage of slightly extending the printed color gamut.
Since separated images are quite device-specific and are restricted to color pre-
press use, they should not be used for general image data interchange. Separated
images are to be used only for prepress applications in which the imagesetter,
paper, ink, and printing press characteristics are known by the creator of the sepa-
rated image.
Note: there is no single method of converting RGB data to CMYK data and back.
In a perfect world, something close to cyan = 255-red, magenta = 255-green, and
yellow = 255-blue should work; but characteristics of printing inks and printing
presses, economics, and the fact that the meaning of RGB itself depends on other
parameters combine to spoil this simplicity.
Requirements
In addition to satisfying the normal Baseline TIFF requirements, a separated TIFF
file must have the following characteristics:
• SamplesPerPixel = N. SHORT. The number of inks. (For example, N=4 for
CMYK, because we have one component each for cyan, magenta, yellow, and
black.)
• BitsPerSample = 8,8,8,8 (for CMYK). SHORT. For now, only 8-bit compo-
nents are recommended. The value “8” is repeated SamplesPerPixel times.
• PhotometricInterpretation = 5 (Separated - usually CMYK). SHORT.
The components represent the desired percent dot coverage of each ink, where
the larger component values represent a higher percentage of ink dot coverage
and smaller values represent less coverage.
69
TIFF 6.0 Specification Final—June 3, 1992
Fields
In addition, there are some new fields, all of which are optional.
InkSet
Tag = 332 (14C.H)
Type = SHORT
N =1
The set of inks used in a separated (PhotometricInterpretation=5) image.
1 = CMYK. The order of the components is cyan, magenta, yellow, black. Usually, a
value of 0 represents 0% ink coverage and a value of 255 represents 100% ink
coverage for that component, but see DotRange below. The InkNames field
should not exist when InkSet=1.
2 = not CMYK. See the InkNames field for a description of the inks to be used.
Default is 1 (CMYK).
NumberOfInks
Tag = 334 (14E.H)
Type = SHORT
N =1
The number of inks. Usually equal to SamplesPerPixel, unless there are extra
samples.
See also ExtraSamples.
Default is 4.
InkNames
Tag = 333 (14D.H)
Type = ASCII
N = total number of characters in all the ink name strings, including the
NULs.
The name of each ink used in a separated (PhotometricInterpretation=5) image,
written as a list of concatenated, NUL-terminated ASCII strings. The number of
strings must be equal to NumberOfInks.
The samples are in the same order as the ink names.
See also InkSet, NumberOfInks.
No default.
70
Previous Next