VeryPDF PDF Extract Tool Command Line is a program developed for extracting fonts, images, drawings, text contents, text positions, metadata, document properties, etc. information from PDF files. This is a brief user guide for it.
1. Download and install | 2. Command line examples | ||
---|---|---|---|
3. Description for extracted files | 4. Sample XML Code for PageContents.xml file | ||
5. Sample Code for TextFileWithPosition.txt file | 6. Command options |
VeryPDF PDF Extract Tool Command Line is a portable application, and it does not need to install. Download the package, unpack it to the disk, open a command prompt window in Windows system, and then you may run it.
The usage rule of the program is
pdfextract.exe [options] <input PDF>In the rule, "pdfextract.exe" is the executable file, field of options is for specifying options, "input PDF" indicates an input PDF. The following command line will extract all information from the given PDF file "test.pdf", and show these information to console.
pdfextract.exe test.pdf
You can use "-outfolder" parameter to set the output folder to save extracted files, e.g.,
pdfextract.exe -outfolder _annotstamp annotstamp.pdf
pdfextract.exe -outfolder _test-long-page test-long-page.pdf
pdfextract.exe -outfolder _test-form test-form.pdf
pdfforms.fdf: Extracted form fields, this file is contain all form names and form values, the contents are available in UTF-8 format.
*.cff;*.ttf;*.afm files: These files are extracted fonts, you can reuse them in MS Word, Photoshop and other drawing editor applications.
*.ppm;*.pbm;*.jpg;*.tif;*bmp;*.png files: These files are extracted images.
PageContents.xml: This file is contain the drawing information, such as, transformation matrix, fontsize, graphics state, color space, single character position, path, filling, etc. information.
Metadata.xml: Extracted metadata XMP file.
cnt*.txt files: These files are contents of original PDF pages.
TextFile.txt: Plain text file.
TextFileWithPosition.txt: Text contents with positions. |
Text Node: |
<text font="Times-Bold" matrix="12.96 0 0 12.96"> <g c="U" x="285.000" y="603.6" /> <g c="S" x="294.357" y="603.6" /> <g c=" " x="301.563" y="603.6" /> <g c="M" x="304.803" y="603.6" /> <g c="a" x="317.037" y="603.6" /> <g c="r" x="323.517" y="603.6" /> <g c="k" x="329.271" y="603.6" /> <g c="e" x="336.477" y="603.6" /> <g c="t" x="342.231" y="603.6" /> <g c=" " x="346.547" y="603.6" /> <g c="S" x="349.787" y="603.6" /> <g c="h" x="356.993" y="603.6" /> <g c="a" x="364.199" y="603.6" /> <g c="r" x="370.679" y="603.6" /> <g c="e" x="376.433" y="603.6" /> </text> |
Fill Path Node: |
<path fill="evenodd"> <moveto x="457" y="393.96" /> <lineto x="492" y="393.96" /> <lineto x="492" y="366.84" /> <lineto x="457" y="366.84" /> <closepath /> </path> |
Stroke Path Node: |
<path fill="stroke" cap="1" join="0" width="0.84" miter="10"> <moveto x="287.28" y="443.4" /> <lineto x="293.04" y="443.4" /> <lineto x="293.04" y="437.6" /> <lineto x="287.28" y="437.6" /> <closepath /> </path> |
//Text Positions for each Word word: x=157.06..188.76 y=18.60..32.55 base=30.17 fontSize=11.52 rot=0 link=00000000 'Home' word: x=197.88..257.12 y=18.60..32.55 base=30.17 fontSize=11.52 rot=0 link=00000000 'PDF-Tools' word: x=266.21..287.18 y=18.60..32.55 base=30.17 fontSize=11.52 rot=0 link=00000000 'Doc' word: x=288.38..323.97 y=18.60..32.55 base=30.17 fontSize=11.52 rot=0 link=00000000 'ument' word: x=333.65..379.00 y=18.60..32.55 base=30.17 fontSize=11.52 rot=0 link=00000000 'Support' word: x=65.66..182.76 y=46.95..72.70 base=68.32 fontSize=21.26 rot=0 link=00000000 'Advanced' word: x=190.02..237.43 y=46.95..72.70 base=68.32 fontSize=21.26 rot=0 link=00000000 'PDF' word: x=245.12..307.43 y=46.95..72.70 base=68.32 fontSize=21.26 rot=0 link=00000000 'Tools' word: x=314.31..432.03 y=46.95..72.70 base=68.32 fontSize=21.26 rot=0 link=00000000 'Command' word: x=439.29..488.87 y=46.95..72.70 base=68.32 fontSize=21.26 rot=0 link=00000000 'Line' word: x=496.13..550.23 y=46.95..72.70 base=68.32 fontSize=21.26 rot=0 link=00000000 'User' word: x=557.23..643.64 y=46.95..72.70 base=68.32 fontSize=21.26 rot=0 link=00000000 'Manual' word: x=8.87..62.31 y=86.81..100.7 base=98.39 fontSize=11.52 rot=0 link=00000000 'Version:' word: x=65.64..94.2 y=86.81..100.7 base=98.39 fontSize=11.52 rot=0 link=00000000 'v2.0' word: x=8.87..79.14 y=117.82..137 base=133.8 fontSize=15.95 rot=0 link=00000000 'Content' word: x=79.86..133.67 y=155.91..169 base=167.4 fontSize=11.52 rot=0 link=00000000 'Overview' word: x=79.86..131.13 y=172.74..186 base=184.3 fontSize=11.52 rot=0 link=00000000 'Features' //Text Positions for each Line line: x=157.06..379.00 y=18.60..32.55 base=30.17 'Home PDF-Tools Doc ument Support' line: x= 65.66..643.64 y=46.95..72.70 base=68.32 'Advanced PDF Tools Command Line User Manual' line: x= 8.87..94.23 y=86.81..100.76 base=98.39 'Version: v2.0' line: x= 8.87..79.14 y=117.82..137.13 base=133.85 'Content' line: x= 79.86..133.67 y=155.91..169.86 base=167.49 'Overview' line: x= 79.86..131.13 y=172.74..186.69 base=184.31 'Features' line: x= 79.86..203.05 y=189.57..203.52 base=201.15 'Command Line Usage' line: x=115.36..263.33 y=223.23..237.18 base=234.81 'Input and output PDF file' line: x=115.36..264.56 y=240.07..254.01 base=251.64 'Show PDF file information' line: x=115.36..253.03 y=256.90..270.84 base=268.47 'Set PDF file information'
VeryPDF PDF Extract Tool Command Line is easy to use, and it has these options:
Usage: pdfextract.exe [options] <PDF-file>
-f <int> : first page to print
-l <int> : last page to print
-opw <string> : owner password (for encrypted files)
-upw <string> : user password (for encrypted files)
-outfolder <string> : Set a folder to store extracted files
-h : print usage information
-help : print usage information
--help : print usage information
-? : print usage information
-$ <string> : input your license key
Example:
pdfextract.exe D:\in.pdf
pdfextract.exe -outfolder D:\out\ D:\in.pdf
pdfextract.exe -outfolder D:\out\ D:\in.pdf
pdfextract.exe -opw 123 -upw 456 -outfolder D:\out\ D:\in.pdf
pdfextract.exe -outfolder D:\out\ D:\in.pdf > out.log
pdfextract.exe -outfolder D:\out\ D:\in.pdf out.log
pdfextract.exe D:\in.pdf out.log