SECTION 10.9
953
Web Capture
If the name is used for an interactive form field, there is an additional encoding to
ensure uniqueness and compatibility with interactive forms. Each byte in the
source string, encoded as described above, is replaced by two bytes in the destina-
tion string. The first byte in each pair is 65 (corresponding to the ASCII character
A
) plus the high-order 4 bits of the source byte; the second byte is 65 plus the low-
order 4 bits of the source byte.
10.9.3 Content Sets
A Web Capture
content set
is a dictionary describing a set of PDF objects gener-
ated from the same source data. It may include information common to all the
objects in the set as well as about the set itself. Table 10.38 shows the contents of
this type of dictionary.
Page Sets
A
page set
is a content set containing a group of PDF page objects generated from
a common source, such as an HTML file. The pages are listed in the
O
array (see
single page object may not belong to more than one page set. Table 10.39 shows
the content set dictionary entries specific to this type of content set.
The optional
TID
(text identifier) entry may be used to store an identifier gener-
ated from the text of the pages belonging to the page set (see “Digital Identifi-
whether the text of a document has changed. A text identifier may not be
appropriate for some page sets (such as those with no text) and should be omit-
ted in these cases.
TABLE 10.38 Entries common to all Web Capture content sets
KEY
TYPE
VALUE
Type
name
name
(Optional)
The type of PDF object that this dictionary describes; if present, must be
SpiderContentSet
for a Web Capture content set.
(Required)
The subtype of content set that this dictionary describes:
SPS
SIS
S
(“Spider page set”) A page set
(“Spider image set”) An image set