Previous Next
158
CHAPTER 3 Syntax
The string types described in Table 3.32 specify increasingly specific encoding
schemes, as shown in Figure 3.7.
string type
text string type ASCII string type byte string type
PDFDocEncoded UTF-16BE encoded string with
string type a leading byte order marker
FIGURE 3.7 Relationship between string types
Text String Type
The text string type is used for character strings that contain information
intended to be human-readable, such as text annotations, bookmark names,
article names, document information, and so forth. The term character strings is
used to describe such strings independent of the encoding with which they are
represented in a PDF document.
Note: This type is not a true type. Rather, it is a string type that represents data en-
coded using specific conventions.
The text string type is used for character strings that are encoded in either PDF-
DocEncoding or the UTF-16BE Unicode character encoding scheme. PDFDocEn-
coding can encode all of the ISO Latin 1 character set and is documented in
Appendix D. UTF-16BE can encode all Unicode characters. UTF-16BE and
Unicode character encoding are described in the Unicode Standard by the
Unicode Consortium (see the Bibliography). Note that PDFDocEncoding does
not support all Unicode characters whereas UTF-16BE does.
For text strings encoded in Unicode, the first two bytes must be 254 followed by
255. These two bytes represent the Unicode byte order marker, U+FEFF, indicating
that the string is encoded in the UTF-16BE (big-endian) encoding scheme
specified in the Unicode standard. (This mechanism precludes beginning a string
Previous Next