CHAPTER 10
888
Document Interchange
Note:
To support consumer applications in providing accessibility to users with dis-
abilities, Tagged PDF documents should use the natural language specification
(
Lang
), alternate description (
Alt
), replacement text (
ActualText
), and abbreviation
expansion text (
E
) facilities described in Section 10.8, “Accessibility Support.”
Incidental Artifacts
In addition to objects that are explicitly marked as artifacts and excluded from
the document’s logical structure, the running text of a page may contain other el-
ements and relationships that are not logically part of the document’s real con-
tent, but merely incidental results of the process of laying out that content into a
document. They may include the following elements:
•
Hyphenation.
Among the artifacts introduced by text layout is the hyphen
marking the incidental division of a word at the end of a line. In Tagged PDF,
such an incidental word division must be represented by a
soft hyphen
charac-
ter, which the Unicode mapping algorithm (see “Unicode Mapping in Tagged
U+00AD
. (This character is
distinct from an ordinary
hard hyphen,
whose Unicode value is
U+002D
.) The
producer of a Tagged PDF document must distinguish explicitly between soft
and hard hyphens so that the consumer does not have to guess which type a
given character represents.
Note:
In some languages, the situation is more complicated: there may be multiple
hyphen characters, and hyphenation may change the spelling of words. See Exam-
•
Text discontinuities.
The running text of a page, as expressed in page content
order (see “Page Content Order,” below), may contain places where the normal
progression of text suffers a discontinuity. For example, the page may contain
the beginnings of two separate articles (see Section 8.3.2, “Articles”), each of
which is continued onto a later page of the document. The last words of the
first article appearing on the page should not be run together with the first
words of the second article. Consumer applications can recognize such discon-
tinuities by examining the document’s logical structure.
•
Hidden page elements.
For a variety of reasons, elements of a document’s logical
content may be invisible on the page: they may be clipped, their color may
match the background, or they may be obscured by other, overlapping objects.
Consumer applications must still be able to recognize and process such hidden
elements. For example, formerly invisible elements may become visible when a