PDF Reference, version 1.7

Previous Next

946 CHAPTER 10 Document Interchange 10.9 Web Capture Web Capture is a PDF 1.3 feature that allows information from Internet-based or locally resident HTML, PDF, GIF, JPEG, and ASCII text files to be imported into a PDF file. This feature is implemented in Acrobat 4.0 and later viewers by a Web Capture plug-in extension (sometimes called AcroSpider). The information in the Web Capture data structures enables viewer applications to perform the fol- lowing operations: • Save locally and preserve the visual appearance of material from the Web • Retrieve additional material from the Web and add it to an existing PDF file • Update or modify existing material previously captured from the Web • Find source information for material captured from the Web, such as the URL (if any) from which it was captured • Find all material in a PDF file that was generated from a given URL • Find all material in a PDF file that matches a given digital identifier (MD5 hash) The information needed to perform these operations is recorded in two data structures in the PDF file: • The Web Capture information dictionary holds document-level information related to Web Capture. • The Web Capture content database keeps track of the material retrieved by Web Capture and where it came from, enabling Web Capture to avoid downloading material that is already present in the file. The following sections provide a detailed overview of these structures. See Appendix C for information about implementation limits in Web Capture. Note: The following discussion centers on HTML and GIF files, although Web Cap- ture handles other file types as well. 10.9.1 Web Capture Information Dictionary The optional SpiderInfo entry in the document catalog (see Section 3.6.1, “Docu- ment Catalog”) holds an optional Web Capture information dictionary containing document-level information related to Web Capture. Table 10.37 shows the con- tents of this dictionary.

Previous Next