CHAPTER 5
474
Text
The
begincodespacerange
and
endcodespacerange
operators in Example 5.16
define the source character code range to be the 2-byte character codes from
< 00 00 >
to
< FF FF >
. The specific mappings for several of the character codes are
shown. For example,
< 00 00 >
to
< 00 5E >
are mapped to the Unicode values
U
+
0020
to
U
+
007E
(where Unicode values are conventionally written as
U+
fol-
lowed by four to six hexadecimal digits). This is followed by the definition of a
mapping where each character code represents more than one Unicode value:
< 005F > < 0061 > [ < 00660066 > < 00660069 > < 00660066006C > ]
In this case, the original character codes are the glyph indices for the ligatures
ff
,
fi
, and
ffl
. The entry defines the mapping from the character codes
< 00 5F >
,
< 00 60 >
, and
< 00 61 >
to the strings of Unicode values with a Unicode scalar val-
ue for each character in the ligature:
U
+
0066 U
+
0066
are the Unicode values for
the character sequence
f f
,
U
+
0066 U
+
0069
for
f i
, and
U
+
0066 U
+
0066 U
+
006c
for
f f l
.
Finally, the character code
< 3A 51>
is mapped to the Unicode value
U
+
2003E
,
which is expressed by the byte sequence
<D840DC3E>
in UTF-16BE encoding.
defined. To support mappings from a source code to a string of destination codes,
the following extension has been made to the ranges defined after a
beginbfchar
operator:
n
beginbfchar
srcCode dstString
endbfchar
where
dstString
can be a string of up to 512 bytes. Likewise, mappings after the
beginbfrange
operator may be defined as
n
beginbfrange
srcCode
1
srcCode
2
dstString
endbfrange
In this case, the last byte of the string is incremented for each consecutive code in
the source code range. When defining ranges of this type, care must be taken to
ensure that the value of the last byte in the string is less than or equal to 255
−
(
srcCode
2
−
srcCode
1
). This ensures that the last byte of the string is not incre-