SECTION 10.7
889
Tagged PDF
page is reflowed, or a text-to-speech engine may choose to speak text that is not
visible to a sighted reader. For the purposes of Tagged PDF, page content is con-
sidered to include all text and illustrations in their entirety, regardless of wheth-
er they are visible when the document is displayed or printed.
Page Content Order
When dealing with material on a page-by-page basis, some Tagged PDF consum-
er applications may wish to process elements in
page content order,
determined by
the sequencing of graphics objects within a page’s content stream and of charac-
ters within a text object, rather than in the
logical structure order
defined by a
depth-first traversal of the page’s logical structure hierarchy. The two orderings
are logically distinct and may or may not coincide. In particular, any artifacts the
page may contain are included in the page content order but not in the logical
structure order, since they are not considered part of the document’s logical
structure. The creator of a Tagged PDF document is responsible for establishing
both an appropriate page content order for each page and an appropriate logical
structure hierarchy for the entire document.
Because the primary requirement for page content order is to enable reflow to
maintain elements in proper reading sequence, it should normally (for Western
writing systems) proceed from top to bottom (and, in a multiple-column layout,
from column to column), with artifacts in their correct relative places. In general,
all parts of an article that appear on a given page should be kept together, even if
it flows to scattered locations on the page. Illustrations or footnotes may be inter-
spersed with the text of the associated article or may appear at the end of its con-
tent (or, in the case of footnotes, at the end of the entire page’s logical content).
In some situations, a producer that intends to generate Tagged PDF may be un-
able to generate correct page content order for part of a document’s contents. This
can occur, for example, if content was extracted from another application, or if
there are ambiguities or missing information in text output. In such cases,
tag sus-
pects (PDF 1.6)
can be used. The producer can identify suspect content by using
marked content (see Section 10.5, “Marked Content”) with a tag of
TagSuspect
, as
shown in Example 10.16. The marked content must have a properties dictionary
with an entry whose name is
TagSuspect
and whose value is
Ordering
, which in-
dicates that the ordering of the enclosed marked content does not meet Tagged
PDF specifications.