Previous Next
49
SECTION 3.1 Lexical Conventions
encoding a specific set of 128 characters as binary numbers. However, a PDF file
is not restricted to the ASCII character set; it can contain arbitrary 8-bit bytes,
subject to the following considerations:
• The tokens that delimit objects and that describe the structure of a PDF file are
all written in the ASCII character set, as are all the reserved words and the
names used as keys in standard dictionaries.
• The data values of certain types of objects—strings and streams—can be but
need not be written entirely in ASCII. For the purpose of exposition (as in this
book), ASCII representation is preferred. However, in actual practice, data that
is naturally binary, such as sampled images, is represented directly in binary for
compactness and efficiency.
• A PDF file containing binary data must be transported and stored by means
that preserve all bytes of the file faithfully; that is, as a binary file rather than a
text file. Such a file is not portable to environments that impose reserved char-
acter codes, maximum line lengths, end-of-line conventions, or other restric-
tions.
Note: In this chapter, the term character is synonymous with byte and merely refers
to a particular 8-bit value. This usage is entirely independent of any logical meaning
that the value may have when it is treated as data in specific contexts, such as repre-
senting human-readable text or selecting a glyph from a font.
3.1.1 Character Set
The PDF character set is divided into three classes, called regular, delimiter, and
white-space characters. This classification determines the grouping of characters
into tokens, except within strings, streams, and comments; different rules apply
in those contexts.
White-space characters (see Table 3.1) separate syntactic constructs such as names
and numbers from each other. All white-space characters are equivalent, except
in comments, strings, and streams. In all other contexts, PDF treats any sequence
of consecutive white-space characters as one character.
Previous Next