diff options
author | Brenton Earl <brent@exitstatusone.com> | 2015-11-14 23:19:43 +0700 |
---|---|---|
committer | Willy Sudiarto Raharjo <willysr@slackbuilds.org> | 2015-11-14 23:19:43 +0700 |
commit | 8ee80adc21733871a05f1eb38d949ddf19a431d2 (patch) | |
tree | 33e2ec48c1785b939d2e73d366eb74952b77bac9 /python/python-pdfminer/README | |
parent | 6a250f182ecd65acfcac96409f71010ba7cb04a6 (diff) | |
download | slackbuilds-8ee80adc21733871a05f1eb38d949ddf19a431d2.tar.gz |
python/python-pdfminer: Added (PDF parser and analyzer).
Signed-off-by: Willy Sudiarto Raharjo <willysr@slackbuilds.org>
Diffstat (limited to 'python/python-pdfminer/README')
-rw-r--r-- | python/python-pdfminer/README | 23 |
1 files changed, 23 insertions, 0 deletions
diff --git a/python/python-pdfminer/README b/python/python-pdfminer/README new file mode 100644 index 0000000000..64ca2affa2 --- /dev/null +++ b/python/python-pdfminer/README @@ -0,0 +1,23 @@ +PDFMiner is a tool for extracting information from PDF documents. Unlike +other PDF-related tools, it focuses entirely on getting and analyzing +text data. PDFMiner allows one to obtain the exact location of text in a +page, as well as other information such as fonts or lines. It includes a +PDF converter that can transform PDF files into other text formats (such +as HTML). It has an extensible PDF parser that can be used for other +purposes than text analysis. + +PDFMiner comes with two handy tools: pdf2txt.py and dumppdf.py. + +pdf2txt.py + +pdf2txt.py extracts text contents from a PDF file. It cannot recognize +text drawn as images. It also extracts locations, font names/sizes, +writing direction. It requires a password for password protected PDF +documents. You cannot extract any text from a PDF document which does +not have extraction permission. + +dumppdf.py + +dumppdf.py dumps the internal contents of a PDF file in pseudo-XML +format. This program is primarily for debugging purposes, but it's also +possible to extract some meaningful contents (e.g. images). |