PDF Format Reference - Adobe Portable Document Format

SECTION 10.9

947

Web Capture

TABLE 10.37 Entries in the Web Capture information dictionary

KEY

TYPE

VALUE

number

(Required)

The Web Capture version number. For PDF 1.3, the version number is 1.0.

Note:

This value is a single real number, not a major and minor version number. Thus, for

example, a version number of 1.2 would be considered greater than 1.15.

array

(Optional)

An array of indirect references to Web Capture command dictionaries (see

“Command Dictionaries” on page 957) describing commands that were used in building

the PDF file. The commands appear in the array in the order in which they were executed

in building the file.

10.9.2 Content Database

Web Capture retrieves HTML files from URLs and converts them to PDF. The re-

sulting PDF file may contain the contents of multiple HTML pages. Conversely,

since HTML pages do not have a fixed size, a single HTML page may give rise to

multiple PDF pages. To keep track of the correspondences, Web Capture main-

tains a

content database

that maps URLs and digital identifiers to PDF objects

such as pages and XObjects. By looking up digital identifiers in the database, Web

Capture can determine whether newly downloaded content is identical to content

already retrieved from a different URL. Thus, it can perform optimizations such

as storing only one copy of an image that is referenced by multiple HTML pages.

Web Capture’s content database is organized into

content sets.

Each content set is

a dictionary holding information about a group of related PDF objects generated

from the same source data. Content sets are of two subtypes:

page sets

and

image

sets.

When Web Capture converts an HTML file to PDF pages, for example, it cre-

ates a page set to hold information about the pages. Similarly, when it converts a

GIF image to one or more image XObjects, it creates an image set describing

those XObjects.

The content set corresponding to a given data source can be accessed in either of

two ways:

•

By the URLs from which it was retrieved

•

By a digital identifier generated from the source data itself (see “Digital Identi-

fiers” on page 950)

The

URLS

and

IDS

entries in a PDF document’s name dictionary (see Section 3.6.3,

“Name Dictionary”) contain name trees mapping URLs and digital identifiers, re-

spectively, to Web Capture content sets. Figure 10.1 shows a simple example. An

Index Bookmark Pages Text

Previous Next

Pages: Index All Pages

This HTML file was created by VeryPDF PDF to HTML Converter product.