PDF Format Reference - Adobe Portable Document Format

SECTION 3.8

159

Common Data Structures

using

PDFDocEncoding

with the two characters

thorn ydieresis

, which is unlikely

to be a meaningful beginning of a word or phrase).

Note:

Applications that process PDF files containing Unicode text strings should be

prepared to handle supplementary characters; that is, characters requiring more

than two bytes to represent.

An escape sequence may appear anywhere in a Unicode text string to indicate the

language in which subsequent text is written, which is useful when the language

cannot be determined from the character codes used in the text. The escape

sequence consists of the following elements, in order:

1. The Unicode value

U+001B

(that is, the byte sequence 0 followed by 27).

2. A 2-character ISO 639 language code—for example,

for English or

for

Japanese.

Character

in this context means byte (as in ASCII character), not

Unicode character.

(Optional)

A 2-character ISO 3166 country code—for example,

for the

United States or

for Japan.

4. The Unicode value

U+001B.

The complete list of codes defined by ISO 639 and ISO 3166 can be obtained

from the International Organization for Standardization (see the Bibliography).

PDFDocEncoded String Type

A PDFDocEncoded string is similar to a string object, but it is a character string

where characters are represented in a single byte using PDFDocEncoding. Note

that

PDFDocEncoding

does not support all Unicode characters whereas UTF-

16BE does.

Note:

This type is not a true type. Rather, it is a string type that represents data en-

coded using a specific convention.

Byte String Type

The byte string type is used for binary data represented as a series of 8-bit bytes,

where each byte can be any value representable in 8 bits. The string may

Index Bookmark Pages Text

Previous Next

Pages: Index All Pages

This HTML file was created by VeryPDF PDF to HTML Converter product.