SECTION 3.6
143
Document Structure
3.6.2 Page Tree
The pages of a document are accessed through a structure known as the
page tree,
which defines the ordering of pages in the document. The tree structure allows
PDF consumer applications, using only limited memory, to quickly open a
document containing thousands of pages. The tree contains nodes of two types—
intermediate nodes, called
page tree nodes,
and leaf nodes, called
page objects—
whose form is described in the sections below. Applications should be prepared
to handle any form of tree structure built of such nodes. The simplest structure
would consist of a single page tree node that references all of the document’s page
objects directly. However, to optimize application performance, the Acrobat
Distiller program constructs trees of a particular form, known as
balanced trees.
Further information on this form of tree can be found in
Data Structures and
Algorithms,
by Aho, Hopcroft, and Ullman (see the Bibliography).
Page Tree Nodes
TABLE 3.26 Required entries in a page tree node
KEY
TYPE
VALUE
Type
name
dictionary
array
integer
(Required)
The type of PDF object that this dictionary describes; must be
Pages
for
a page tree node.
(Required except in root node; must be an indirect reference)
The page tree node that
is the immediate parent of this one.
(Required)
An array of indirect references to the immediate children of this node.
The children may be page objects or other page tree nodes.
(Required)
The number of leaf nodes (page objects) that are descendants of this
node within the page tree.
Parent
Kids
Count
Note:
The structure of the page tree is not necessarily related to the logical structure
of the document; that is, page tree nodes do not represent chapters, sections, and so
forth. (Other data structures are defined for that purpose; see Section 10.6, “Logical
serve the existing structure of the page tree.