Home PDF2TXT Sample Support Document Component

PDF to HTML OCR Converter Command Line

Convert normal PDF to HTML and scanned PDF to HTML of editable and searchable with OCR technology

What is OCR technology?

Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. OCR technology is widely used as a form of data entry from some sort of original paper data source, whether documents, sales receipts, mail, or any number of printed records. OCR technology is crucial to the computerization of printed texts so that they can be electronically searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech and text mining. OCR technology is a field of research in pattern recognition, artificial intelligence and computer vision.

What is OCR technology? What is OCR? OCR Technology

VeryPDF's PDF to HTML OCR Converter Command Line converts PDF to HTML singly or in batches with OCR technology. There are in fact two types of PDF documents. Normal PDF files produced by any PDF software are searchable and editable files. PDF files created by scanning books, pages or any physical documents are image PDF files to the machine. Scanned PDF files can not be edited or searched by text. People had to retype entire documents in the past. Luckily VeryPDF's PDF to HTML OCR Converter Command Line allows converting scanned PDF to HTML of editable and searchable in seconds.

Download and Purchase PDF to HTML OCR Converter Command Line

Version	Quantity	Price (USD)	Download	Buy All
PDF to HTML OCR Converter Command Line	1 Server License	195/each
PDF to HTML OCR Converter Command Line	1 Developer License	1495/each
OCR Language Packs		Free		Free

Note: PDF to HTML OCR Converter Command Line only contain OCR technology for language English. However you can download more OCR language packs at here.

Features and Abilities on PDF to HTML OCR Converter Command Line:

PDF to HTML OCR Converter command line supports 32bit and 64bit systems Win95/98/ME/NT/2000/XP/2003/Vista/7;
PDF to HTML OCR Converter command line supports command line operation (for manual use or inclusion in scripts);
PDF to HTML OCR Converter command line converts normal PDF to HTML and scanned PDF to HTML without Adobe Acrobat or free Acrobat Reader software;
PDF to HTML OCR Converter command line OCR technology allows you to select languages flexibly, e.g., German, French, English, Spanish, Italian etc.;
PDF to HTML OCR Converter command line converts scanned PDF to HTML in batches with OCR technology;
PDF to HTML OCR Converter command line produces editable and searchable HTML files with OCR technology and command line;
PDF to HTML OCR Converter command line supports page selection, OCR single, range or all pages at a time;
PDF to HTML OCR Converter command line lets you convert image to HTML with OCR technology and command line singly or in batches;
PDF to HTML OCR Converter command line enables you to convert encrypted PDF to HTML in batches smoothly with command line;
PDF to HTML OCR Converter command line supports inserting additional text at end of each text page, e.g., Page Information is %PageNumber% of %PageCount%;
PDF to HTML OCR Converter command line helps you retain original styles of PDF to HTML accurately.

PDF to HTML OCR Converter Command Line Options:
-------------------------------------------------------
Usage: pdf2txtocr.exe [options] <PDF> <Text>

-firstpage <int>   : first PDF page to convert
-lastpage <int>    : last PDF page to convert
-res <int>         : set resolution, the unit is DPI (default is 300 dpi)
-ownerpwd <string> : set owner password for encrypted PDF file
-userpwd <string> : set user password for encrypted PDF file
-layout            : maintain original physical layout
-noc               : don't insert page breaks 0x0C between pages in text file
-bitcount <int>    : set color depth when render PDF page to image data, it can be set 1, 8, 24, default is 8bit
-ocr               : enable OCR function for scanned PDF file
-lang <string>     : choose the language for OCR engine
-text <string>     : add additional text at end of each text page, this parameter supports the following variables:
    %PageNumber%   : current page number
    %PageCount%    : total page count of PDF file
-$ <string>        : input your License Key

Useful Examples:

pdf2txtocr.exe C:\in.pdf C:\out.html
pdf2txtocr.exe -firstpage 1 -lastpage 1 C:\in.pdf C:\out.html
pdf2txtocr.exe -ocr -res 300 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ownerpwd 123 -userpwd 456 C:\in.pdf C:\out.html
pdf2txtocr.exe -layout C:\in.pdf C:\out.txt
pdf2txtocr.exe -noc C:\in.pdf C:\out.txt
pdf2txtocr.exe C:\in.tif C:\out.txt
pdf2txtocr.exe C:\in.jpg C:\out.txt
pdf2txtocr.exe C:\in.bmp C:\out.txt
pdf2txtocr.exe C:\in.png C:\out.txt
pdf2txtocr.exe -ocr -lang eng C:\in.pdf C:\out.html
pdf2txtocr.exe -ocr -bitcount 1 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -bitcount 8 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -bitcount 24 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -lang deu C:\in.pdf C:\out.html
pdf2txtocr.exe -lang deu C:\in.tif C:\out.txt
pdf2txtocr.exe -text "PageText %PageNumber% of %PageCount%" C:\in.pdf C:\out.txt

Following command line will OCR all PDF files in D:\temp\ folder to text files:
for %F in (D:\temp\*.pdf) do pdf2txtocr.exe -ocr -lang deu "%F" "%~dpnF.txt"

Following command line will OCR all PDF files in D:\temp\ folder and subdirectories to text files:
for /r D:\temp %F in (*.pdf) do pdf2txtocr.exe -ocr "%F" "%~dpnF.txt"

Following command line will OCR all PDF files from D:\temp\ folder and output text files to C:\test folder:
for %F in (D:\temp\*.pdf) do pdf2txtocr.exe -ocr "%F" "C:\test\%~nF.txt""

Take a Look at Other Tools also:

DocConverter COM Component (+HTML2PDF.exe): Convert HTML, DOC, RTF, XLS, PPT, TXT etc. files to PDF files, it is depend on PDFcamp Printer product.
Image to PDF Converter: Convert 40+ image formats to PDF files.
HTML Converter: Convert HTML files to TIF, TIFF, JPG, JPEG, GIF, PNG, BMP, PCX, TGA, JP2 (JPEG2000), PNM, etc. formats.
PDF to HTML Converter: Convert PDF files to HTML documents.
PDF to Text Converter: Convert PDF files to plain text files.
PDF to Vector Converter: Convert PDF files to PS, EPS, WMF, EMF, XPS, PCL, HPGL, SWF, SVG, etc. vector files.
PDF to Image Converter: Convert PDF files to TIF, TIFF, JPG, GIF, PNG, BMP, EMF, PCX, TGA formats.

More Products at VeryPDF

Email Us: support@verypdf.com

Search By Keywords:
JPEG TO DOCUMENT :: JPEG TO DOC :: JPEG TO EDITABLE DOCUMENT :: JPEG TO EDITABLE DOC :: JPEG TO DOCX :: JPEG TO WORD :: JPEG TO OFFICE :: JPEG TO OPENOFFICE :: JPEG TO XML :: JPEG TO EDITABLE WORD :: PNG TO TXT :: PNG TO TEXT :: PNG TO PLAIN TEXT :: PNG TO RTF :: PNG TO HTML :: PNG TO ASCII :: PNG TO HTM :: PNG TO TEXT DOCUMENT :: PNG TO DOCUMENT :: PNG TO DOC ::