python/python-pdfminer: Added (PDF parser and analyzer).

Signed-off-by: Willy Sudiarto Raharjo <willysr@slackbuilds.org>
author: Brenton Earl <brent@exitstatusone.com> 2015-11-14 23:19:43 +0700
committer: Willy Sudiarto Raharjo <willysr@slackbuilds.org> 2015-11-14 23:19:43 +0700
commit: 8ee80adc21733871a05f1eb38d949ddf19a431d2 (patch)
tree: 33e2ec48c1785b939d2e73d366eb74952b77bac9 /python/python-pdfminer/README
parent: 6a250f182ecd65acfcac96409f71010ba7cb04a6 (diff)
download: slackbuilds-8ee80adc21733871a05f1eb38d949ddf19a431d2.tar.gz
1 files changed, 23 insertions, 0 deletions
diff --git a/python/python-pdfminer/README b/python/python-pdfminer/README
new file mode 100644
index 0000000000..64ca2affa2
--- /dev/null
+++ b/python/python-pdfminer/README
@@ -0,0 +1,23 @@
+PDFMiner is a tool for extracting information from PDF documents. Unlike
+other PDF-related tools, it focuses entirely on getting and analyzing
+text data. PDFMiner allows one to obtain the exact location of text in a
+page, as well as other information such as fonts or lines. It includes a
+PDF converter that can transform PDF files into other text formats (such
+as HTML). It has an extensible PDF parser that can be used for other
+purposes than text analysis.
+
+PDFMiner comes with two handy tools: pdf2txt.py and dumppdf.py.
+
+pdf2txt.py
+
+pdf2txt.py extracts text contents from a PDF file.  It cannot recognize 
+text drawn as images.  It also extracts locations, font names/sizes, 
+writing direction.  It requires a password for password protected PDF 
+documents.  You cannot extract any text from a PDF document which does 
+not have extraction permission.
+
+dumppdf.py
+
+dumppdf.py dumps the internal contents of a PDF file in pseudo-XML
+format. This program is primarily for debugging purposes, but it's also
+possible to extract some meaningful contents (e.g. images).
author	Brenton Earl <brent@exitstatusone.com>	2015-11-14 23:19:43 +0700
committer	Willy Sudiarto Raharjo <willysr@slackbuilds.org>	2015-11-14 23:19:43 +0700
commit	8ee80adc21733871a05f1eb38d949ddf19a431d2 (patch)
tree	33e2ec48c1785b939d2e73d366eb74952b77bac9 /python/python-pdfminer/README
parent	6a250f182ecd65acfcac96409f71010ba7cb04a6 (diff)
download	slackbuilds-8ee80adc21733871a05f1eb38d949ddf19a431d2.tar.gz