SECTION 5.9
471
Extraction of Text Content
the Adobe standard Latin character set and the set of named characters in the
Symbol
font (see Appendix D):
1. Map the character code to a character name according to Table D.1 on
page 996 and the font’s
Differences
array.
2. Look up the character name in the
Adobe Glyph List
(see the Bibliography)
to obtain the corresponding Unicode value.
If the font is a composite font that uses one of the predefined CMaps listed in
Table 5.15 on page 442 (except
Identity–H
and
Identity–V
) or whose descendant
CIDFont uses the Adobe-GB1, Adobe-CNS1, Adobe-Japan1, or Adobe-Korea1
character collection:
1. Map the character code to a character identifier (CID) according to the
font’s CMap.
2. Obtain the registry and ordering of the character collection used by the
font’s CMap (for example,
Adobe
and
Japan1
) from its
CIDSystemInfo
dic-
tionary.
3. Construct a second CMap name by concatenating the registry and order-
ing obtained in step 2 in the format
registry–ordering–UCS2
(for example,
Adobe–Japan1–UCS2
).
4. Obtain the CMap with the name constructed in step 3 (available from the
ASN Web site; see the Bibliography).
5. Map the CID obtained in step 1 according to the CMap obtained in step 4,
producing a Unicode value.
Note:
Type 0 fonts whose descendant CIDFonts use the Adobe-GB1, Adobe-CNS1,
Adobe-Japan1, or Adobe-Korea1 character collection (as specified in the
CIDSystemInfo
dictionary) must have a supplement number corresponding to the
version of PDF supported by the application. See Table 5.16 on page 446 for a list of
the character collections corresponding to a given PDF version. (Other supplements
of these character collections can be used, but if the supplement is higher-numbered
than the one corresponding to the supported PDF version, only the CIDs in the latter
supplement are considered to be standard CIDs.)
If these methods fail to produce a Unicode value, there is no way to determine
what the character code represents.
Index Bookmark Pages Text
Previous Next
Pages: Index All Pages
This HTML file was created by VeryPDF PDF to HTML Converter product.