This article introduces how to use VeryPDF OCR to Any Converter Command Line application. VeryPDF OCR to Any Converter Command Line is powerful application which can be used to batch convert scanned PDF, TIFF and various image formats to editable Office, TXT, HTML, etc. formats.
1. Requirements | 2. Supported formats |
---|---|
3. Install and download | 4. Introduce Command Line Options |
5. Purchase OCR to Any Converter Command Line | |
System requirements:
Windows 2000 / XP / Server 2003 / Vista / Server 2008 / 7 / 10 / 11 and later systems, both 32bit and 64bit systems.
Input formats: Text based PDF, Scanned PDF, Single page TIFF, multi-page TIFF, JPEG, JPG, PNG, BMP, GIF, PCX, TGA, PBM, PNM, PPM.
Output formats: Reading Layout Text, Physical Layout Text, multi-column Text, Pure Text based PDF file, Invisible Text layer PDF file, RTF, Word DOC, Tab Text format, CSV, Excel XLS, HTML, image formats (TIFF, JPEG, JPG, PNG, BMP, GIF, PCX, TGA, PBM, PNM, PPM).
Download—Please click the following link to download VeryPDF OCR to Any Converter Command Line
Install—Then, please unzip the .zip package to a folder, launch a CMD window, switch the current folder to unzipped folder, you can run ocr2any.exe from unzipped folder to convert scanned files to other formats easily.
Main Command Line Interface
When launch VeryPDF OCR to Any Converter Command Line, you will see the following interfaces,
Fig. Main Command Line Interface
Convert color TIFF file to searchable PDF file,
ocr2any.exe -ocrmode 4 test_color.tif _test_color.pdf
Convert color PDF file to grayscale PDF file,
ocr2any.exe -ocr -ocrmode 4 -res 72 _test_color.pdf _test-pdf2pdf-grayscale.pdf
Convert color PDF file to a new color PDF file,
ocr2any.exe -ocr -ocrmode 4 -res 72 -bitcount 24 _test_color.pdf _test-pdf2pdf-color.pdf
Deskew TIFF file, apply OCR, output to a new PDF file,
ocr2any.exe -imageopt -ocr -ocrmode 3 test_skew.tif _test_skew_tif.pdf
Convert multiple columns PDF file to text file with reading order,
ocr2any.exe test_multi_columns.pdf _test_multi_columns.txt
Convert multiple columns TIFF file to text file with reading order,
ocr2any.exe test_multi_columns.tif _test_multi_columns.txt
Convert multiple columns PDF file to multiple columns text file with physical layout,
-layout parameter is only available for text based PDF file, it will be ignored if input is scanned PDF document,
ocr2any.exe -layout test_multi_columns.pdf _test_multi_columns.pdf.layout.txt
Convert scanned PDF file to searchable PDF file with more OCR options,
-ocrmode <int> : set OCR mode, the value can be selected from 0 to 4,
-ocrmode 0: output to text file
-ocrmode 1: OCR PDF pages and insert new text layer under original PDF pages
-ocrmode 2: output to plain text based PDF file
-ocrmode 3: output to OCRed PDF file (BW) with hidden text layer
-ocrmode 4: output to OCRed PDF file (Color) with hidden text layer
ocr2any.exe -ocr -ocrmode 1 test_multi_columns.pdf _test\_test_multi_columns_mode1.pdf
ocr2any.exe -ocr -ocrmode 2 test_multi_columns.pdf _test\_test_multi_columns_mode2.pdf
ocr2any.exe -ocr -ocrmode 3 test_multi_columns.pdf _test\_test_multi_columns_mode3.pdf
ocr2any.exe -ocr -ocrmode 3 -ownerpwdout 123 -keylen 2 -encryption 3900 test_multi_columns.pdf _test\_test_multi_columns_mode3_encryption.pdf
Extract [X, Y, Width, Height] information for each character from scanned PDF, TIFF, and Image files,
ocr2any.exe -ocr -outboxfile test_multi_columns.pdf _test\_test_outboxfile.txt
You will get an output text file like below, one word per line, "[X, Y, Width, Height] Word", you can write a simple application to parse this text file, so that you will able to get the coordinate for each word easily,
328,234,60,34,US
404,234,284,34,2005/0118291
704,234,57,34,A1
328,452,163,30,throughout
501,452,44,30,the
556,452,160,30,cytoplasm.
729,452,193,30,Interestingly,
935,452,83,30,Golgi
1028,452,159,30,complexes
1197,452,28,30,in
327,495,224,30,placebo+CC14
571,495,84,30,group
676,495,108,30,contain
804,495,80,30,small
904,495,175,30,loW-density
1098,495,125,30,vesicles.
329,539,83,30,Golgi
426,539,158,30,complexes
598,539,28,30,in
......
......
Insert custom page footer to the bottom of each page in the output text file,
ocr2any.exe -text "======================== This is the page %PageNumber% of %PageCount% ========================" test_text.pdf _test\_test_text.pdf.txt
ocr2any.exe -layout -text "======================== This is the page %PageNumber% of %PageCount% ========================" test_text.pdf _test\_test_text.pdf.layout.txt
Convert color TIFF file to Black and White TIFF file without use halftone technology,
ocr2any.exe -bitcount 1 test_color.tif _test\_test_color.tif.bitcount1.tif
Convert color TIFF file to Black and White TIFF file with a threshold value,
ocr2any.exe -threshold 240 test_color.tif _test\_test_color.tif.threshold240.tif
Convert color TIFF file to Black and White TIFF file with an Optimized threshold value,
ocr2any.exe -threshold 0 test_color.tif _test\_test_color.tif.threshold.auto.tif
Convert color TIFF file to Black and White TIFF file with more dither options,
-dither <int> : convert the color image to B&W using the desired method:
-dither 0: Floyd-Steinberg
-dither 1: Ordered-Dithering (4x4)
-dither 2: Burkes
-dither 3: Stucki
-dither 4: Jarvis-Judice-Ninke
-dither 5: Sierra
-dither 6: Stevenson-Arce
-dither 7: Bayer (4x4 ordered dithering)
ocr2any.exe -dither 0 test_color.tif _test\_test_color.tif.dither0.tif
ocr2any.exe -dither 7 -resizewidth 200 -resizeheight 200 test_color.tif _test\_test_color.tif.dither7.tif
Rotate 45 degree (angle) for TIFF file,
ocr2any.exe -rotate 45 test_color.tif _test\_test_color.tif.rotate45.tif
Apply Deskew, Despeckle and Noise Removal, Black Border Remova, Flip, Mirror, etc. options to TIFF and Image files automatically,
ocr2any.exe -imageopt test_despeckle.tif _test\_test_despeckle.tif
ocr2any.exe -imageopt test_skew.tif _test\_test_skew.tif
ocr2any.exe -flip -mirror test_color.tif _test\_test_color.tif.flip.mirror.tif
Despeckle Example #1:
Despeckle Example #2:
Despeckle Example #3:
Deskewing:
Convert color or grayscale image to Black and White image, this process is called "binarization" and it is very important step because incorrect binarization will cause a lot of problems.
Flip and Mirror Image
OCR PDF, TIFF and Image files with German, French and more languages,
ocr2any.exe -ocr -lang deu test_german.tif _test\_test_german.tif.txt
Convert more languages from PDF file to UTF-8 text file,
ocr2any.exe test_german.pdf _test\_test_german.pdf.txt
Convert more languages from scanned PDF file to RTF file with Enhanced OCR Engine,
ocr2any.exe -ocr2 -lang deu test_german.tif _test\_test_german.tif.rtf
Convert more languages from scanned TIFF file to RTF file with Enhanced OCR Engine,
ocr2any.exe -ocr2 test_color.tif _test\_test_color.rtf
Use Enhanced OCR Engine to extract table objects from scanned PDF file and insert table objects into RTF file,
ocr2any.exe -ocr2 test_table_ocr.pdf _test\_test_table_ocr.pdf.rtf
Use Enhanced OCR Engine to extract table objects from scanned PDF file and insert table objects into Excel XLS file,
-ocr2excelmode <int> : set output Excel format when -ocr2 used
-ocr2excelmode 0: One big sheet + All page sheets
-ocr2excelmode 1: All page sheets
-ocr2excelmode 2: One big sheet, default mode
ocr2any.exe -ocr2 -ocr2excelmode 0 test_table_ocr.pdf _test\_test_table_ocr.pdf.0.xls
ocr2any.exe -ocr2 -ocr2excelmode 1 test_table_ocr.pdf _test\_test_table_ocr.pdf.1.xls
ocr2any.exe -ocr2 -ocr2excelmode 2 test_table_ocr.pdf _test\_test_table_ocr.pdf.2.xls
Use Enhanced OCR Engine to extract table objects from scanned PDF file and insert table objects into HTML document,
ocr2any.exe -ocr2 test_table_ocr.pdf _test\_test_table_ocr.pdf.html
Use Enhanced OCR Engine to extract table objects from scanned PDF file and insert table objects into Word DOC document,
ocr2any.exe -ocr2 -ocr2aor test_auto_rotate.tif _test\_test_auto_rotate.doc
Batch convert all PDF files to DOC and HTML files with Enhanced OCR Engine,
ocr2any.exe -ocr2 *.pdf _test\*.doc
ocr2any.exe -ocr2 *.pdf _test\*.html
The trial version of VeryPDF OCR to Any Converter Command Line can only process a few pages from input files. And you can only run VeryPDF OCR to Any Converter Command Line for 300 times. To remove the limitations, please purchase the product.