Home PDF2TXT Sample Support Document Component

Acrobat to HTML OCR Converter Command Line

About Acrobat to HTML OCR Converter

VeryPDF's Acrobat to HTML OCR Converter is a command line application which helps you OCR scanned PDF to HTML and images to HTML files (TIFF, BMP, PNG, JPG, PCX, TGA, etc.) on Windows platforms. VeryPDF's Acrobat to HTML OCR Converter can not only help you produce basic HTML from original PDF files and image files, but also allow you to edit HTML simple properties etc. with related parameters. Besides, Acrobat to HTML OCR Converter does NOT need Adobe Acrobat or free Acrobat Reader software.

About OCR technology

Often abbreviated OCR, optical character recognition refers to the branch of computer science that involves reading text from paper and translating the images into a form that the computer can manipulate (for example, into ASCII codes). An OCR system enables you to take a book or a magazine article, feed it directly into an electronic computer file, and then edit the file using a word processor etc..

Download and Purchase Acrobat to HTML OCR Converter Command Line

What is OCR?

Optical Character Recognition (OCR) is a visual recognition process that turns printed or written text into an electronic character-based file. A document that is scanned and converted into a PDF document provides the basis for which character recognition software may interpret each character image on the PDF and assign it an electronic character-based file that can then be entered into an editable format, such as a Text or Word document.

What is HTML?

HTML is a computer language devised to allow website creation. These websites can then be viewed by anyone else connected to the Internet. It is relatively easy to learn, with the basics being accessible to most people in one sitting; and quite powerful in what it allows you to create. It is constantly undergoing revision and evolution to meet the demands and requirements of the growing Internet audience under the direction of the W3C, the organisation charged with designing and maintaining the language.

About Acrobat to HTML OCR Converter Command Line

Acrobat to HTML OCR Converter Command Line is a Command Line application uses Optical Character Recognition technology to OCR scanned PDF documents and images (TIFF, BMP, PNG, JPG, PCX, TGA, etc.) to HTML files.The default package of Acrobat to HTML OCR Converter Command Line includes support for only English. However you can download more OCR language packs at here.

Download and Purchase Acrobat to HTML OCR Converter Command Line product,

Version	Quantity	Price (USD)	Download	Buy All
Acrobat to HTML OCR Converter Command Line	1 Server License	$195 /each
Acrobat to HTML OCR Converter Command Line	1 Developer License	$1495 /each
OCR Language Packs		Free		Free

Note: For more supported languages package of Acrobat to HTML OCR Converter Command Line besides default English one, please click here for downloading more OCR language packs.

Acrobat to HTML OCR Converter Command Line has following features:

Support all Windows platforms, 32bit and 64bit Win95/98/ME/NT/2000/XP/2003/Vista/7;
Support extracting text from scanned PDF to HTML singly or in batches;
Support converting encrypted PDF to HTML;
Able to help you optionally maintain original physical layout of PDF files;
Able to convert PDF file to text file with reading order layout;
Able to insert or remove page break 0x0C characters between pages in text file;
Convert scanned PDF files to editable text files;
Convert scanned image files (TIFF, BMP, PNG, JPG, PCX, TGA, etc.) to editable text files;
Able to insert additional text at end of each text page, e.g., Page Information is %PageNumber% of %PageCount%;
Support over 10 Languages, Besides English, Acrobat to HTML OCR Converter also supports German, French, Spanish, Italian and many Languages else;
Convert text based PDF documents to text format, Fast, Accurate, Free Trial;
Support command line operation for manual use or inclusion in scripts.

Supported Options on Acrobat to HTML OCR Converter Command Line:

Acrobat to HTML OCR Converter Command Line features

Acrobat to HTML OCR Converter Command Line Options

Related Products:
PDF to Word OCR Converter: Convert PDF to Word documents with OCR technology.
PDF to Excel OCR Converter: Convert PDF files to Excel file with OCR technology.
Image to PDF OCR Converter: Convert different kinds of images to PDF file with OCR tech.
-------------------------------------------------------
Usage: pdf2txtocr.exe [options] <PDF-file> <Text-file>
-firstpage <int>   : first PDF page to convert
-lastpage <int>    : last PDF page to convert
-res <int>         : set resolution, the unit is DPI (default is 300 dpi)
-ownerpwd <string> : set owner password for encrypted PDF file
-userpwd <string> : set user password for encrypted PDF file
-layout            : maintain original physical layout
-noc               : don't insert page breaks 0x0C between pages in text file
-bitcount <int>    : set color depth when render PDF page to image data, it can be set 1, 8, 24, default is 8bit
-ocr               : enable OCR function for scanned PDF file
-lang <string>     : choose the language for OCR engine
-text <string>     : add additional text at end of each text page, this parameter supports the following variables:
    %PageNumber%   : current page number
    %PageCount%    : total page count of PDF file
-$ <string>        : input your License Key

Examples:
pdf2txtocr.exe C:\in.pdf C:\out.txt
pdf2txtocr.exe -firstpage 1 -lastpage 1 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -res 300 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ownerpwd 123 -userpwd 456 C:\in.pdf C:\out.txt
pdf2txtocr.exe -layout C:\in.pdf C:\out.txt
pdf2txtocr.exe -noc C:\in.pdf C:\out.txt
pdf2txtocr.exe C:\in.tif C:\out.txt
pdf2txtocr.exe C:\in.jpg C:\out.txt
pdf2txtocr.exe C:\in.bmp C:\out.txt
pdf2txtocr.exe C:\in.png C:\out.txt
pdf2txtocr.exe -ocr -lang eng C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -bitcount 1 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -bitcount 8 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -bitcount 24 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -lang deu C:\in.pdf C:\out.txt
pdf2txtocr.exe -lang deu C:\in.tif C:\out.txt
pdf2txtocr.exe -text "PageText %PageNumber% of %PageCount%" C:\in.pdf C:\out.txt

Following command line will OCR all PDF files in D:\temp\ folder to text files:
for %F in (D:\temp\*.pdf) do pdf2txtocr.exe -ocr -lang deu "%F" "%~dpnF.txt"

Following command line will OCR all PDF files in D:\temp\ folder and subdirectories to text files:
for /r D:\temp %F in (*.pdf) do pdf2txtocr.exe -ocr "%F" "%~dpnF.txt"

Following command line will OCR all PDF files from D:\temp\ folder and output text files to C:\test folder:
for %F in (D:\temp\*.pdf) do pdf2txtocr.exe -ocr "%F" "C:\test\%~nF.txt"

Read More: What is OCR? What is OCR? OCR Technology

Other Tools for Your Overviews here Also:

PDF to HTML Converter: Convert PDF files to HTML documents.
PDF to Text Converter: Convert PDF files to plain text files.
PDF to Vector Converter: Convert PDF files to PS, EPS, WMF, EMF, XPS, PCL, HPGL, SWF, SVG, etc. vector files.
PDF to Image Converter: Convert PDF files to TIF, TIFF, JPG, GIF, PNG, BMP, EMF, PCX, TGA formats.
DocConverter COM Component (+HTML2PDF.exe): Convert HTML, DOC, RTF, XLS, PPT, TXT etc. files to PDF files, it is depend on PDFcamp Printer product.
Image to PDF Converter: Convert 40+ image formats to PDF files.
HTML Converter: Convert HTML files to TIF, TIFF, JPG, JPEG, GIF, PNG, BMP, PCX, TGA, JP2 (JPEG2000), PNM, etc. formats.