User guide of VeryPDF OCR to Any Converter Command Line - how to convert scanned PDF and TIFF to editable Word DOC, Excel, HTML, etc. documents

This article introduces how to use VeryPDF OCR to Any Converter Command Line application. VeryPDF OCR to Any Converter Command Line is powerful application which can be used to batch convert scanned PDF, TIFF and various image formats to editable Office, TXT, HTML, etc. formats.

5. Purchase OCR to Any Converter Command Line
1. Requirements	2. Supported formats
3. Install and download	4. Introduce Command Line Options

1.Requirements

System requirements:

Windows 2000 / XP / Server 2003 / Vista / Server 2008 / 7 / 10 / 11 and later systems, both 32bit and 64bit systems.

2. Supported formats

Input formats: Text based PDF, Scanned PDF, Single page TIFF, multi-page TIFF, JPEG, JPG, PNG, BMP, GIF, PCX, TGA, PBM, PNM, PPM.

Output formats: Reading Layout Text, Physical Layout Text, multi-column Text, Pure Text based PDF file, Invisible Text layer PDF file, RTF, Word DOC, Tab Text format, CSV, Excel XLS, HTML, image formats (TIFF, JPEG, JPG, PNG, BMP, GIF, PCX, TGA, PBM, PNM, PPM).

3. Install and download

Download—Please click the following link to download VeryPDF OCR to Any Converter Command Line

Install—Then, please unzip the .zip package to a folder, launch a CMD window, switch the current folder to unzipped folder, you can run ocr2any.exe from unzipped folder to convert scanned files to other formats easily.

4. Introduce Command Line Options

Main Command Line Interface

When launch VeryPDF OCR to Any Converter Command Line, you will see the following interfaces,

VeryPDF OCR to Any Converter Command Line

VeryPDF OCR to Any Converter Command Line
Fig. Main Command Line Interface

Convert color TIFF file to searchable PDF file,

ocr2any.exe -ocrmode 4 test_color.tif _test_color.pdf

Convert color TIFF file to searchable PDF file with OCR

Convert color PDF file to grayscale PDF file,

ocr2any.exe -ocr -ocrmode 4 -res 72 _test_color.pdf _test-pdf2pdf-grayscale.pdf

Convert Color PDF to Grayscale PDF

Convert color PDF file to a new color PDF file,

ocr2any.exe -ocr -ocrmode 4 -res 72 -bitcount 24 _test_color.pdf _test-pdf2pdf-color.pdf

Deskew TIFF file, apply OCR, output to a new PDF file,

ocr2any.exe -imageopt -ocr -ocrmode 3 test_skew.tif _test_skew_tif.pdf

Skewed TIFF Image Deskew TIFF file, apply OCR, output to a new PDF file

Convert multiple columns PDF file to text file with reading order,

ocr2any.exe test_multi_columns.pdf _test_multi_columns.txt

Convert multiple columns TIFF file to text file with reading order,

ocr2any.exe test_multi_columns.tif _test_multi_columns.txt

Convert multiple columns PDF file to multiple columns text file with physical layout,

-layout parameter is only available for text based PDF file, it will be ignored if input is scanned PDF document,

ocr2any.exe -layout test_multi_columns.pdf _test_multi_columns.pdf.layout.txt

Convert scanned PDF file to searchable PDF file with more OCR options,

-ocrmode <int> : set OCR mode, the value can be selected from 0 to 4,
        -ocrmode 0: output to text file
        -ocrmode 1: OCR PDF pages and insert new text layer under original PDF pages
        -ocrmode 2: output to plain text based PDF file
        -ocrmode 3: output to OCRed PDF file (BW) with hidden text layer
        -ocrmode 4: output to OCRed PDF file (Color) with hidden text layer

ocr2any.exe -ocr -ocrmode 1 test_multi_columns.pdf _test\_test_multi_columns_mode1.pdf
ocr2any.exe -ocr -ocrmode 2 test_multi_columns.pdf _test\_test_multi_columns_mode2.pdf
ocr2any.exe -ocr -ocrmode 3 test_multi_columns.pdf _test\_test_multi_columns_mode3.pdf
ocr2any.exe -ocr -ocrmode 3 -ownerpwdout 123 -keylen 2 -encryption 3900 test_multi_columns.pdf _test\_test_multi_columns_mode3_encryption.pdf

Extract [X, Y, Width, Height] information for each character from scanned PDF, TIFF, and Image files,

ocr2any.exe -ocr -outboxfile test_multi_columns.pdf _test\_test_outboxfile.txt

You will get an output text file like below, one word per line, "[X, Y, Width, Height] Word", you can write a simple application to parse this text file, so that you will able to get the coordinate for each word easily,

328,234,60,34,US
404,234,284,34,2005/0118291
704,234,57,34,A1
328,452,163,30,throughout
501,452,44,30,the
556,452,160,30,cytoplasm.
729,452,193,30,Interestingly,
935,452,83,30,Golgi
1028,452,159,30,complexes
1197,452,28,30,in
327,495,224,30,placebo+CC14
571,495,84,30,group
676,495,108,30,contain
804,495,80,30,small
904,495,175,30,loW-density
1098,495,125,30,vesicles.
329,539,83,30,Golgi
426,539,158,30,complexes
598,539,28,30,in

......

Insert custom page footer to the bottom of each page in the output text file,

ocr2any.exe -text "======================== This is the page %PageNumber% of %PageCount% ========================" test_text.pdf _test\_test_text.pdf.txt

ocr2any.exe -layout -text "======================== This is the page %PageNumber% of %PageCount% ========================" test_text.pdf _test\_test_text.pdf.layout.txt

Convert color TIFF file to Black and White TIFF file without use halftone technology,

ocr2any.exe -bitcount 1 test_color.tif _test\_test_color.tif.bitcount1.tif

Convert color TIFF file to Black and White TIFF file with a threshold value,

ocr2any.exe -threshold 240 test_color.tif _test\_test_color.tif.threshold240.tif

Convert color TIFF file to Black and White TIFF file with an Optimized threshold value,

ocr2any.exe -threshold 0 test_color.tif _test\_test_color.tif.threshold.auto.tif

halftone image

Convert color TIFF file to Black and White TIFF file with more dither options,

-dither <int> : convert the color image to B&W using the desired method:
        -dither 0: Floyd-Steinberg
        -dither 1: Ordered-Dithering (4x4)
        -dither 2: Burkes
        -dither 3: Stucki
        -dither 4: Jarvis-Judice-Ninke
        -dither 5: Sierra
        -dither 6: Stevenson-Arce
        -dither 7: Bayer (4x4 ordered dithering)

dither image

ocr2any.exe -dither 0 test_color.tif _test\_test_color.tif.dither0.tif
ocr2any.exe -dither 7 -resizewidth 200 -resizeheight 200 test_color.tif _test\_test_color.tif.dither7.tif

Rotate 45 degree (angle) for TIFF file,

ocr2any.exe -rotate 45 test_color.tif _test\_test_color.tif.rotate45.tif

Original Image Rotate Image

Apply Deskew, Despeckle and Noise Removal, Black Border Remova, Flip, Mirror, etc. options to TIFF and Image files automatically,

ocr2any.exe -imageopt test_despeckle.tif _test\_test_despeckle.tif
ocr2any.exe -imageopt test_skew.tif _test\_test_skew.tif
ocr2any.exe -flip -mirror test_color.tif _test\_test_color.tif.flip.mirror.tif

Despeckle Example #1:

Despeckle image file

Despeckle Example #2:

original speckle image

Despeckle Image

Despeckle Example #3:

Despeckle Image

Deskewing:

Deskew Image

Convert color or grayscale image to Black and White image, this process is called "binarization" and it is very important step because incorrect binarization will cause a lot of problems.

binarization image

Flip and Mirror Image

Original image file Flip and Mirror Image

OCR PDF, TIFF and Image files with German, French and more languages,

ocr2any.exe -ocr -lang deu test_german.tif _test\_test_german.tif.txt

Convert more languages from PDF file to UTF-8 text file,

ocr2any.exe test_german.pdf _test\_test_german.pdf.txt

Convert more languages from scanned PDF file to RTF file with Enhanced OCR Engine,

ocr2any.exe -ocr2 -lang deu test_german.tif _test\_test_german.tif.rtf

Convert more languages from scanned TIFF file to RTF file with Enhanced OCR Engine,

ocr2any.exe -ocr2 test_color.tif _test\_test_color.rtf

pdf to word of original pdf file

pdf to editable word doc

Use Enhanced OCR Engine to extract table objects from scanned PDF file and insert table objects into RTF file,

ocr2any.exe -ocr2 test_table_ocr.pdf _test\_test_table_ocr.pdf.rtf

Use Enhanced OCR Engine to extract table objects from scanned PDF file and insert table objects into Excel XLS file,

-ocr2excelmode <int> : set output Excel format when -ocr2 used
        -ocr2excelmode 0: One big sheet + All page sheets
        -ocr2excelmode 1: All page sheets
        -ocr2excelmode 2: One big sheet, default mode

ocr2any.exe -ocr2 -ocr2excelmode 0 test_table_ocr.pdf _test\_test_table_ocr.pdf.0.xls
ocr2any.exe -ocr2 -ocr2excelmode 1 test_table_ocr.pdf _test\_test_table_ocr.pdf.1.xls
ocr2any.exe -ocr2 -ocr2excelmode 2 test_table_ocr.pdf _test\_test_table_ocr.pdf.2.xls

OCR Table

TIFF to Excel Table OCR Converter

Use Enhanced OCR Engine to extract table objects from scanned PDF file and insert table objects into HTML document,

ocr2any.exe -ocr2 test_table_ocr.pdf _test\_test_table_ocr.pdf.html

OCR Table

Use Enhanced OCR Engine to extract table objects from scanned PDF file and insert table objects into Word DOC document,

ocr2any.exe -ocr2 -ocr2aor test_auto_rotate.tif _test\_test_auto_rotate.doc

detect text orientation

tiff to word detect text orientation

Batch convert all PDF files to DOC and HTML files with Enhanced OCR Engine,

ocr2any.exe -ocr2 *.pdf _test\*.doc
ocr2any.exe -ocr2 *.pdf _test\*.html

5. Purchase VeryPDF OCR to Any Converter Command Line

The trial version of VeryPDF OCR to Any Converter Command Line can only process a few pages from input files. And you can only run VeryPDF OCR to Any Converter Command Line for 300 times. To remove the limitations, please purchase the product.