Previous Next
946
CHAPTER 10 Document Interchange
10.9 Web Capture
Web Capture is a PDF 1.3 feature that allows information from Internet-based or
locally resident HTML, PDF, GIF, JPEG, and ASCII text files to be imported into
a PDF file. This feature is implemented in Acrobat 4.0 and later viewers by a Web
Capture plug-in extension (sometimes called AcroSpider). The information in
the Web Capture data structures enables viewer applications to perform the fol-
lowing operations:
• Save locally and preserve the visual appearance of material from the Web
• Retrieve additional material from the Web and add it to an existing PDF file
• Update or modify existing material previously captured from the Web
• Find source information for material captured from the Web, such as the URL
(if any) from which it was captured
• Find all material in a PDF file that was generated from a given URL
• Find all material in a PDF file that matches a given digital identifier (MD5
hash)
The information needed to perform these operations is recorded in two data
structures in the PDF file:
• The Web Capture information dictionary holds document-level information
related to Web Capture.
• The Web Capture content database keeps track of the material retrieved by Web
Capture and where it came from, enabling Web Capture to avoid downloading
material that is already present in the file.
The following sections provide a detailed overview of these structures. See
Appendix C for information about implementation limits in Web Capture.
Note: The following discussion centers on HTML and GIF files, although Web Cap-
ture handles other file types as well.
10.9.1 Web Capture Information Dictionary
The optional SpiderInfo entry in the document catalog (see Section 3.6.1, “Docu-
ment Catalog”) holds an optional Web Capture information dictionary containing
document-level information related to Web Capture. Table 10.37 shows the con-
tents of this dictionary.
Previous Next