CHAPTER 3
158
Syntax
The string types described in Table 3.32 specify increasingly specific encoding
schemes, as shown in Figure 3.7.
string type
text string type
ASCII string type
byte string type
PDFDocEncoded
string type
UTF-16BE encoded string with
a leading byte order marker
FIGURE 3.7
Relationship between string types
Text String Type
The text string type is used for character strings that contain information
intended to be human-readable, such as text annotations, bookmark names,
article names, document information, and so forth. The term
character strings
is
used to describe such strings independent of the encoding with which they are
represented in a PDF document.
Note:
This type is not a true type. Rather, it is a string type that represents data en-
coded using specific conventions.
The text string type is used for character strings that are encoded in either
PDF-
DocEncoding
or the UTF-16BE Unicode character encoding scheme.
PDFDocEn-
coding
can encode all of the ISO Latin 1 character set and is documented in
Unicode character encoding are described in the
Unicode Standard
by the
Unicode Consortium (see the Bibliography). Note that
PDFDocEncoding
does
not support all Unicode characters whereas UTF-16BE does.
For text strings encoded in Unicode, the first two bytes must be 254 followed by
255. These two bytes represent the Unicode byte order marker,
U+FEFF
, indicating
that the string is encoded in the UTF-16BE (big-endian) encoding scheme
specified in the Unicode standard. (This mechanism precludes beginning a string