Previous Next
883
SECTION 10.7 Tagged PDF
404 0 obj % ID tree leaf node
<< /Limits [ ( Chap1 ) ( Sec1.3 ) ] % Least and greatest keys in tree
/Names [ ( Chap1 ) 301 0 R % Mapping from element identifiers
( Sec1.1 ) 302 0 R % to structure elements
( Sec1.2 ) 303 0 R
( Sec1.3 ) 304 0 R
]
>>
endobj
10.7 Tagged PDF
Tagged PDF (PDF 1.4) is a stylized use of PDF that builds on the logical structure
framework described in Section 10.6, “Logical Structure.” It defines a set of stan-
dard structure types and attributes that allow page content (text, graphics, and
images) to be extracted and reused for other purposes. It is intended for use by
tools that perform the following types of operations:
• Simple extraction of text and graphics for pasting into other applications
• Automatic reflow of text and associated graphics to fit a page of a different size
than was assumed for the original layout
• Processing text for such purposes as searching, indexing, and spell-checking
• Conversion to other common file formats (such as HTML, XML, and RTF)
with document structure and basic styling information preserved
• Making content accessible to users with visual impairments (see Section 10.8,
“Accessibility Support)
A tagged PDF document conforms to the following conventions:
• Page content (Section 10.7.1, “Tagged PDF and Page Content”). Tagged PDF
defines a set of rules for representing text in the page content so that characters,
words, and text order can be determined reliably. All text is represented in a
form that can be converted to Unicode. Word breaks are represented explicitly.
Actual content is distinguished from artifacts of layout and pagination. Content
is given in an order related to its appearance on the page, as determined by the
authoring application.
• A basic layout model (Section 10.7.2, “Basic Layout Model”). A set of rules for
describing the arrangement of structure elements on the page.
Previous Next