PDF Format Reference - Adobe Portable Document Format

SECTION 5.9

471

Extraction of Text Content

the Adobe standard Latin character set and the set of named characters in the

Symbol

font (see Appendix D):

1. Map the character code to a character name according to Table D.1 on

page 996 and the font’s

Differences

array.

2. Look up the character name in the

Adobe Glyph List

(see the Bibliography)

to obtain the corresponding Unicode value.

•

If the font is a composite font that uses one of the predefined CMaps listed in

Table 5.15 on page 442 (except

Identity–H

and

Identity–V

) or whose descendant

CIDFont uses the Adobe-GB1, Adobe-CNS1, Adobe-Japan1, or Adobe-Korea1

character collection:

1. Map the character code to a character identifier (CID) according to the

font’s CMap.

2. Obtain the registry and ordering of the character collection used by the

font’s CMap (for example,

Adobe

and

Japan1

) from its

CIDSystemInfo

dic-

tionary.

3. Construct a second CMap name by concatenating the registry and order-

ing obtained in step 2 in the format

registry–ordering–UCS2

(for example,

Adobe–Japan1–UCS2

4. Obtain the CMap with the name constructed in step 3 (available from the

ASN Web site; see the Bibliography).

5. Map the CID obtained in step 1 according to the CMap obtained in step 4,

producing a Unicode value.

Note:

Type 0 fonts whose descendant CIDFonts use the Adobe-GB1, Adobe-CNS1,

Adobe-Japan1, or Adobe-Korea1 character collection (as specified in the

CIDSystemInfo

dictionary) must have a supplement number corresponding to the

version of PDF supported by the application. See Table 5.16 on page 446 for a list of

the character collections corresponding to a given PDF version. (Other supplements

of these character collections can be used, but if the supplement is higher-numbered

than the one corresponding to the supported PDF version, only the CIDs in the latter

supplement are considered to be standard CIDs.)

If these methods fail to produce a Unicode value, there is no way to determine

what the character code represents.

Index Bookmark Pages Text

Previous Next

Pages: Index All Pages

This HTML file was created by VeryPDF PDF to HTML Converter product.