SECTION 10.7
885
Tagged PDF
Real Content and Artifacts
The graphics objects in a document can be divided into two classes:
•
The
real content
of a document comprises objects representing material origi-
nally introduced by the document’s author.
•
Artifacts
are graphics objects that are typically not part of the author’s original
content but rather are generated by the PDF producer application in the course
of pagination, layout, or other strictly mechanical processes. Artifacts may also
be used to describe areas of the document where the author uses a graphical
background, with the goal of enhancing the visual experience. In such a case,
the background is not required for understanding the content.
The document’s logical structure encompasses all graphics objects making up the
real content and describes how those objects relate to one another. It does not in-
clude graphics objects that are mere artifacts of the layout and production pro-
cess.
A document’s real content includes not only the page content stream and subsid-
iary form XObjects but also associated annotations that meet all of the following
conditions:
•
The annotation has an appearance stream (see Section 8.4.4, “Appearance
N
) appearance.
•
The annotation’s Hidden flag (see Section 8.4.2, “Annotation Flags”) is not set.
•
The annotation is included in the document’s logical structure (see Section
Specification of Artifacts
An artifact can be explicitly distinguished from real content by enclosing it in a
marked-content sequence with the tag
Artifact
:
/Artifact
BMC
…
EMC
/Artifact
propertyList
BDC
…
EMC
or