CHAPTER 10
856
Document Interchange
PDF logical structure shares basic features with standard document markup
languages such as HTML, SGML, and XML. A document’s logical structure is
expressed as a hierarchy of
structure elements,
each represented by a dictionary
object. Like their counterparts in other markup languages, PDF structure
elements can have content and attributes. In PDF, rendered document content
takes over the role occupied by text in HTML, SGML, and XML.
A PDF document’s logical structure is stored separately from its visible content,
with pointers from each to the other. This separation allows the ordering and
nesting of logical elements to be entirely independent of the order and location of
graphics objects on the document’s pages.
The
MarkInfo
entry in the document catalog (see Section 3.6.1, “Document Cata-
mark information dictionary,
whose entries are shown in
structured PDF documents.
TABLE 10.8 Entries in the mark information dictionary
KEY
TYPE
VALUE
Marked
boolean
(Optional)
A flag indicating whether the document conforms to Tagged PDF
conventions. Default value:
false
.
Note:
If
Suspects
is true, the document may not completely conform to Tagged PDF
conventions.
UserProperties
boolean
(Optional; PDF 1.6)
A flag indicating the presence of structure elements that
contain user properties attributes (see “User Properties” on page 876). Default
value:
false
.
(Optional; PDF 1.6)
A flag indicating the presence of tag suspects (see “Page
false
.
Suspects
boolean
10.6.1 Structure Hierarchy
The logical structure of a document is described by a hierarchy of objects called
the
structure hierarchy
or
structure tree.
At the root of the hierarchy is a dictionary
object called the
structure tree root,
located by means of the
StructTreeRoot
entry
in the document catalog (see Section 3.6.1, “Document Catalog”). Table 10.9
shows the entries in the structure tree root dictionary. The
K
entry specifies the
immediate children of the structure tree root, which are
structure elements.