Previous Next
899
SECTION 10.7 Tagged PDF
Ordinarily, structure elements having standard structure types are processed the
same way whether the type is expressed directly or is determined indirectly from
the role map. However, some consumer applications may ascribe additional se-
mantics to nonstandard structure types, even though the role map associates
them with standard ones. For instance, the actual values of the S entries may be
used when exporting to a tagged representation such as XML, and the corre-
sponding role-mapped values are used when converting to presentation formats
such as HTML or RTF, or for purposes such as reflow or accessibility to users
with disabilities.
Note: Most of the standard element types are designed primarily for laying out text;
the terminology reflects this usage. However, a layout can in fact include any type of
content, such as path or image objects. The content items associated with a structure
element are laid out on the page as if they were blocks of text (for a BLSE) or charac-
ters within a line of text (for an ILSE).
Grouping Elements
Grouping elements are used solely to group other structure elements; they are not
directly associated with content items. Table 10.20 describes the standard struc-
ture types for elements in this category. Section G.7, “Structured Elements That
Describe Hierarchical Lists” provides an example of nested table of content items.
For most content extraction formats, the document must be a tree with a single
top-level element; the structure tree root (identified by the StructTreeRoot entry
in the document catalog) must have only one child in its K (kids) array. If the PDF
file contains a complete document, the structure type Document is recommended
for this top-level element in the logical structure hierarchy. If the file contains a
well-formed document fragment, one of the structure types Part, Art, Sect, or Div
may be used instead.
TABLE 10.20 Standard structure types for grouping elements
STRUCTURE TYPE DESCRIPTION
Document (Document) A complete document. This is the root element of any structure tree containing
multiple parts or multiple articles.
Part (Part) A large-scale division of a document. This type of element is appropriate for grouping
articles or sections.
Previous Next