Convert normal PDF to HTML and scanned PDF to HTML of editable and searchable with OCR technology
What is OCR technology?
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. OCR technology is widely used as a form of data entry from some sort of original paper data source, whether documents, sales receipts, mail, or any number of printed records. OCR technology is crucial to the computerization of printed texts so that they can be electronically searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech and text mining. OCR technology is a field of research in pattern recognition, artificial intelligence and computer vision.
What is OCR
technology?
What is OCR? OCR Technology
VeryPDF's PDF to HTML OCR Converter Command Line converts PDF to HTML
singly or in batches with OCR technology. There are in fact two types of PDF
documents. Normal PDF files produced by any PDF software are searchable and
editable files. PDF files created by scanning books, pages or any physical
documents are image PDF files to the machine. Scanned PDF files can not be
edited or searched by text. People had to retype entire documents in the past.
Luckily
VeryPDF's PDF to HTML OCR Converter Command Line allows converting
scanned PDF to HTML of editable and searchable in seconds.
Download and Purchase PDF to HTML OCR
Converter Command Line
Version |
Quantity |
Price (USD) |
Download |
|
PDF to HTML OCR Converter Command Line |
1 Server License | 195/each | ||
1 Developer License | 1495/each | |||
OCR Language Packs |
|
Free |
Free |
Note: PDF to HTML OCR Converter Command Line only contain OCR technology for language English. However you can download more OCR language packs at here.
Features and Abilities on PDF to HTML OCR Converter Command Line:
PDF to HTML OCR Converter Command Line Options:
-------------------------------------------------------
Usage: pdf2txtocr.exe [options] <PDF> <Text>
-firstpage <int> : first PDF page to convert
-lastpage <int> : last PDF page to convert
-res <int> : set resolution, the
unit is DPI (default is 300 dpi)
-ownerpwd <string> : set owner password for encrypted PDF file
-userpwd <string> : set user password for encrypted PDF file
-layout :
maintain original physical layout
-noc
: don't insert page breaks 0x0C between pages in text file
-bitcount <int> : set color depth when render PDF page to
image data, it can be set 1, 8, 24, default is 8bit
-ocr
: enable OCR function for scanned PDF file
-lang <string> : choose the language for OCR engine
-text <string> : add additional text at end of each text
page, this parameter supports the following variables:
%PageNumber% : current page number
%PageCount% : total page count of PDF file
-$ <string> : input your License Key
Useful Examples:
pdf2txtocr.exe C:\in.pdf C:\out.html
pdf2txtocr.exe -firstpage 1 -lastpage 1 C:\in.pdf C:\out.html
pdf2txtocr.exe -ocr -res 300 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ownerpwd 123 -userpwd 456 C:\in.pdf C:\out.html
pdf2txtocr.exe -layout C:\in.pdf C:\out.txt
pdf2txtocr.exe -noc C:\in.pdf C:\out.txt
pdf2txtocr.exe C:\in.tif C:\out.txt
pdf2txtocr.exe C:\in.jpg C:\out.txt
pdf2txtocr.exe C:\in.bmp C:\out.txt
pdf2txtocr.exe C:\in.png C:\out.txt
pdf2txtocr.exe -ocr -lang eng C:\in.pdf C:\out.html
pdf2txtocr.exe -ocr -bitcount 1 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -bitcount 8 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -bitcount 24 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -lang deu C:\in.pdf C:\out.html
pdf2txtocr.exe -lang deu C:\in.tif C:\out.txt
pdf2txtocr.exe -text "PageText %PageNumber% of %PageCount%" C:\in.pdf C:\out.txt
Following command line will OCR all PDF
files in D:\temp\ folder to text files:
for %F in (D:\temp\*.pdf) do pdf2txtocr.exe -ocr -lang deu "%F" "%~dpnF.txt"
Following command line will OCR all PDF
files in D:\temp\ folder and subdirectories to text files:
for /r D:\temp %F in (*.pdf) do pdf2txtocr.exe -ocr "%F" "%~dpnF.txt"
Following command line will OCR all PDF
files from D:\temp\ folder and output text files to C:\test folder:
for %F in (D:\temp\*.pdf) do pdf2txtocr.exe -ocr "%F" "C:\test\%~nF.txt""
Take a Look at Other Tools also:
DocConverter COM
Component (+HTML2PDF.exe): Convert HTML, DOC, RTF, XLS, PPT, TXT etc.
files to PDF files, it is depend on
PDFcamp Printer
product.
Image to
PDF Converter: Convert 40+ image formats to PDF files.
HTML
Converter: Convert HTML files to TIF, TIFF, JPG, JPEG, GIF, PNG, BMP, PCX,
TGA, JP2 (JPEG2000), PNM, etc. formats.
PDF to HTML
Converter: Convert PDF files to HTML documents.
PDF to Text
Converter: Convert PDF files to plain text files.
PDF to
Vector Converter: Convert PDF files to PS, EPS, WMF, EMF, XPS, PCL, HPGL,
SWF, SVG, etc. vector files.
PDF to Image
Converter: Convert PDF files to TIF, TIFF, JPG, GIF, PNG, BMP, EMF, PCX, TGA
formats.
Email Us: support@verypdf.com
Search By Keywords:
JPEG TO DOCUMENT ::
JPEG TO DOC ::
JPEG TO EDITABLE DOCUMENT ::
JPEG TO EDITABLE DOC ::
JPEG TO DOCX ::
JPEG TO WORD ::
JPEG TO OFFICE ::
JPEG TO OPENOFFICE ::
JPEG TO XML ::
JPEG TO EDITABLE WORD ::
PNG TO TXT ::
PNG TO TEXT ::
PNG TO PLAIN TEXT ::
PNG TO RTF ::
PNG TO HTML ::
PNG TO ASCII ::
PNG TO HTM ::
PNG TO TEXT DOCUMENT ::
PNG TO DOCUMENT ::
PNG TO DOC ::
VeryPDF.com
|
VeryDOC.com |
VeryPCL.com |
Links |
Contact
Copyright © 2002- VeryPDF.com, Inc. All rights reserved.
Send comments about this site to the
webmaster.