Each of these is a "flavor" of PDF with different capabilities and issues. PDF flavors are behind some oft-heard questions I receive such as:
- Why isn't this PDF searchable?
- Why is this PDF 50K and this one is 10K?
- Why does this PDF print slowly?
- Why does this PDF look funny on screen?
- Why can't I select text from this PDF?
Not all PDFs are created equal. Some PDFs are more usable or offer benefits that other typed do no.
I'll examine the different flavors below and make some recommendations.
Why does this matter?
If you choose the wrong flavor of PDF or compression, you may run into the following problems:
- Wasteful storage on local computer and networks by using the wrong type of compression
- Longer print times
- Excessively large PDFs that you intend to eFile making it difficult to meet court rules
I've met with a great many law firms and have seen some pretty wacky methods of creating PDF. It is not uncommon to see someone print out a Word document and then scan it back in to create a PDF! Ack!
Flavors of PDF
The table below discusses the four basic flavors of PDF:
PDF Normal
|
PDF Image
Only |
PDF Image+Text
|
Combination
|
|
What is it?
|
Often called an "electronic PDF", this type of PDF has never hit paper and was converted directly from an electronic source. | An image in a PDF wrapper. Could be an image of a page of text or a JPEG, etc. inside a PDF. | An image inside a PDF with an invisible layer of searchable text. | Any of the types at left. |
Where does it come from?
|
Produced directly from a software application by "printing" to PDF or using the 1-button PDF creators supplied by Acrobat | Scanners, Digital Copy Machines, TIFFs converted to PDF. | An image-only file that has been OCR'd using Acrobat Standard or Professional. | Create from Multiple Files in Acrobat allows you to combine any kinds of PDFs together. |
Is
it searchable?
|
Yes 100% accurate since no OCR has taken place |
No. Does not contain any searchable text. |
Yes OCR is not a perfect process. Do not expect 100% accuracy. |
Depends If the combined PDFs are searchable, yes. |
Notes
|
Prints
fastest. Prints at best quality. Smallest file size. |
Recommend no more than 300dpi for scanning. A good format to use in discovery when you don't want to give the other side an advantage. | Best way to make paper documents searchable. | Can contain multiple document sizes. |
PDF Settings Affecting File Size
PDF Normal offers the best performance, smallest file size and best searchability. These fully electronic files contain all the fonts needed for printing. If you have an option to create PDF Normal, always use it!
When creating PDFs from paper, carefully choose your compression and scanning resolution.
There are three common black & white compression algorithms used for scanned images:
File Size | Compression |
Larger
| | | | | Smaller |
CCITT Group 4 |
JBIG2 Lossless | |
JBIG2 Lossy |
If you choose Create PDF from Scanner in Acrobat, the default compression is JBIG2 Lossless. This offers a great balance between file size and quality.
Other hardware and software products that scan to PDF generally use the CCITT Group 4 compression which is considerable larger.
CCITT Group 4 compression was developed as a fax compression technology. The rudimentary processors of fax machines in the early 1980s had just enough power to decompress CCITT Group 4 files. Surprisingly, it is still widely used, but is an inefficient compression scheme.
While rarely relevant in the legal market, Acrobat is intelligent enough to compress files selectively using Adaptive compression. A color brochure may have black text, a color image and line art, each of which can have different compression schemes. If you need to scan color brochures and the like-- perhaps in an Intellectual Property dispute-- choose the Searchable Image-Compact option.
I've conducted several visual tests on JBIG2 Lossless versus Lossy. It is difficult to detect the differences between these two compression schemes on good quality scanned documents. If you have good originals, go ahead and use the Lossy JBIG2.
File Size Comparison
The table below compares the file sizes of a typical 8.5" by 11" legal document for various flavors of PDF:
Single Page Legal Document - 200 DPI
|
||||
PDF Normal
|
PDF Image Only
200 dpi |
PDF Image Only
200 dpi |
PDF Image Only
200 dpi |
PDF+Text
200 dpi |
9.71K
|
40.79K
|
20.91K
|
9.4K
|
26.64K
|
Compression and Notes
|
||||
Fonts
Embedded, no tags
|
CCITT G4
|
JBIG2
Lossless
|
JBIG2
Lossy
|
JBIG2
Lossy Compression
|
Single Page Legal Document - 300 DPI
|
||||
PDF Normal
|
PDF Image Only
300 dpi |
PDF Image Only
300 dpi |
PDF Image Only
300 dpi |
PDF+Text
300 dpi |
9.71K
|
53.77K
|
31.02K
|
10.7K
|
34.34K
|
Compression and Notes
|
||||
Fonts
Embedded, no tags
|
CCITT G4
|
JBIG2
Lossless
|
JBIG2
Lossy
|
JBIG2
Lossy Compression
|
Testing Protocol
- The PDF Normal file was created by choosing the Adobe PDF print driver. [Note 1]
- The PDF Normal file was opened in Acrobat and saved as either 200 or 300 dpi uncompressed TIFFs.
- PDF Optimizer was used to target three types of compression: CCITT G4, JPBIG2 Lossless and JBIG2 Lossy.
- All image and image+text PDFs were created using Acrobat 7 by choosing Recognize Text Using OCR.
Recommendations
Here are my tips for making the best choices when working with PDF files:
-
Where did that PDF come from? You need to know .
. .
Unless you scan it in yourself using the Create PDF from Scanner option in Acrobat, most likely your PDF file could be made a lot smaller using the PDF Optimizer in Acrobat Professional. Chances are the image-only and image+text PDFs you get from outside your firm use, old, inefficient CCITT Group 4 compression. -
Keep Electronic Documents Electronic
Always convert electronic documents directly to PDF using the 1-button PDF Creators installed by Acrobat into Office applications or using the Adobe PDF print driver. You'll have a considerably smaller file if you do so and searchability is much better. -
Scan at 300dpi, OCR and then Downsample if
Necessary
You'll get more accurate OCR scanning at 300 dpi. Always downsample and compress using the PDF Optimizer in Acrobat Professional after performing OCR. Acrobat Professional can also batch down-sample, too. -
Try JBIG2 Lossy Compression
Although the Lossy word is a bit scary, give this compression scheme a try. Documents still look good on-screen and file sizes can be 50% smaller.
Notes
1. Multiple-page PDF
Normal files are considerably smaller that mult-page
image-only PDFs. Single page PDF Normal files must
contain all the fonts necessary to render the page.
This information does not need to be duplicated for
successive pages.
See also:
How to Optimize PDF
Files for Web Sites?
How to Compress your PDF
files?
PDF Tools Command Line
options
PDF
Compression Command Line options
JPEG2000 compression in
Advanced PDF Tools
Modify Custom Properties
in Advanced PDF Tools
Scale your PDF pages
with PDF Tools and docPrint
VeryPDF JBIG2 Compression Engine
Understanding "Flavors" of PDF
What is
PDF/A? What is PDF/X?
Comparison
between JPEG and JPEG 2000