Previous Next
469
SECTION 5.9 Extraction of Text Content
the glyph descriptions. These tables, as well as the “cmap” table, are required to be
present when embedding fonts. In addition, for OpenType fonts based on True-
Type, the “head,” “hhea,” “loca,” “maxp,” “cvt ,” “prep,” “hmtx,” and “fpgm” tables
are required.
Note: Other tables, such as those used for advanced line layout, need not be present;
however, their absence may prevent editing of text containing the font.
The process of finding glyph descriptions in OpenType fonts is the following:
• For Type 1 fonts using “CFF” tables, the process is as described in “Encodings
for Type 1 Fonts” on page 428.
• For TrueType fonts using “glyf ” tables, the process is as described in “Encod-
ings for TrueType Fonts” on page 429. Since this process sometimes produces
ambiguous results, it is strongly recommended that PDF creators, instead of us-
ing a simple font, use a Type 0 font with an Identity-H encoding and use the
glyph indices as character codes, as described following Table 5.15 on page 442.
• For CIDFontType0 fonts using “CFF” tables, the process is as described in the
discussion of embedded Type 0 CIDFonts in “Glyph Selection in CIDFonts” on
page 437.
• For CIDFontType2 fonts using “glyf ” tables, the process is as described in the
discussion of embedded Type 2 CIDFonts in “Glyph Selection in CIDFonts” on
page 437.
As discussed in Section 5.5.3, “Font Subsets,” an embedded font program may
contain only the subset of glyphs that are used in the PDF document. This may be
indicated by the presence of a CharSet or CIDSet entry in the font descriptor that
refers to the font file, although subset fonts are not always so identified.
5.9 Extraction of Text Content
The preceding sections describe all the facilities for showing text and causing
glyphs to be painted on the page. In addition to displaying text, consumer appli-
cations sometimes need to determine the information content of text—that is, its
meaning according to some standard character identification as opposed to its
rendered appearance. This need arises during operations such as searching, in-
dexing, and exporting of text to other applications.
Previous Next