automatic import of python-pdfminer

author: CoprDistGit <infra@openeuler.org> 2023-03-09 14:56:19 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-03-09 14:56:19 +0000
commit: 72e2236f1b973972942c30ffd7b2848322ae97e4 (patch)
tree: 6b0d9b020f1e772855c4a502616e68898123f9af
parent: 399066ff07f558388dc59169329211eef5c830ec (diff)
3 files changed, 395 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..4267a5c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/pdfminer-20191125.tar.gz
diff --git a/python-pdfminer.spec b/python-pdfminer.spec
new file mode 100644
index 0000000..69b1887
--- /dev/null
+++ b/python-pdfminer.spec
@@ -0,0 +1,393 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-pdfminer
+Version:	20191125
+Release:	1
+Summary:	PDF parser and analyzer
+License:	MIT
+URL:		http://github.com/euske/pdfminer
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/71/a3/155c5cde5f9c0b1069043b2946a93f54a41fd72cc19c6c100f6f2f5bdc15/pdfminer-20191125.tar.gz
+BuildArch:	noarch
+
+
+%description
+# PDFMiner
+
+PDFMiner is a text extraction tool for PDF documents.
+
+[![Build Status](https://travis-ci.org/euske/pdfminer.svg?branch=master)](https://travis-ci.org/euske/pdfminer)
+[![PyPI](https://img.shields.io/pypi/v/pdfminer)](https://pypi.org/project/pdfminer/)
+
+**Warning**: Starting from version 20191010, PDFMiner supports **Python 3 only**.
+For Python 2 support, check out
+<a href="https://github.com/pdfminer/pdfminer.six">pdfminer.six</a>.
+
+## Features:
+
+  * Pure Python (3.6 or above).
+  * Supports PDF-1.7. (well, almost)
+  * Obtains the exact location of text as well as other layout information (fonts, etc.).
+  * Performs automatic layout analysis.
+  * Can convert PDF into other formats (HTML/XML).
+  * Can extract an outline (TOC).
+  * Can extract tagged contents.
+  * Supports basic encryption (RC4 and AES).
+  * Supports various font types (Type1, TrueType, Type3, and CID).
+  * Supports CJK languages and vertical writing scripts.
+  * Has an extensible PDF parser that can be used for other purposes.
+
+
+## How to Use:
+
+  1. `> pip install pdfminer`
+  1. `> pdf2txt.py samples/simple1.pdf`
+
+
+## Command Line Syntax:
+
+### pdf2txt.py
+
+pdf2txt.py extracts all the texts that are rendered programmatically.
+It also extracts the corresponding locations, font names, font sizes,
+writing direction (horizontal or vertical) for each text segment.  It
+does not recognize text in images. A password needs to be provided for
+restricted PDF documents.
+
+    > pdf2txt.py [-P password] [-o output] [-t text|html|xml|tag]
+                 [-O output_dir] [-c encoding] [-s scale] [-R rotation]
+                 [-Y normal|loose|exact] [-p pagenos] [-m maxpages]
+                 [-S] [-C] [-n] [-A] [-V]
+                 [-M char_margin] [-L line_margin] [-W word_margin]
+                 [-F boxes_flow] [-d]
+                 input.pdf ...
+
+  * `-P password` : PDF password.
+  * `-o output` : Output file name.
+  * `-t text|html|xml|tag` : Output type. (default: automatically inferred from the output file name.)
+  * `-O output_dir` : Output directory for extracted images.
+  * `-c encoding` : Output encoding. (default: utf-8)
+  * `-s scale` : Output scale.
+  * `-R rotation` : Rotates the page in degree.
+  * `-Y normal|loose|exact` : Specifies the layout mode. (only for HTML output.)
+  * `-p pagenos` : Processes certain pages only.
+  * `-m maxpages` : Limits the number of maximum pages to process.
+  * `-S` : Strips control characters.
+  * `-C` : Disables resource caching.
+  * `-n` : Disables layout analysis.
+  * `-A` : Applies layout analysis for all texts including figures.
+  * `-V` : Automatically detects vertical writing.
+  * `-M char_margin` : Speficies the char margin.
+  * `-W word_margin` : Speficies the word margin.
+  * `-L line_margin` : Speficies the line margin.
+  * `-F boxes_flow` : Speficies the box flow ratio.
+  * `-d` : Turns on Debug output.
+
+### dumppdf.py
+
+dumppdf.py is used for debugging PDFs.
+It dumps all the internal contents in pseudo-XML format.
+
+    > dumppdf.py [-P password] [-a] [-p pageid] [-i objid]
+                 [-o output] [-r|-b|-t] [-T] [-O directory] [-d]
+                 input.pdf ...
+
+  * `-P password` : PDF password.
+  * `-a` : Extracts all objects.
+  * `-p pageid` : Extracts a Page object.
+  * `-i objid` : Extracts a certain object.
+  * `-o output` : Output file name.
+  * `-r` : Raw mode. Dumps the raw compressed/encoded streams.
+  * `-b` : Binary mode. Dumps the uncompressed/decoded streams.
+  * `-t` : Text mode. Dumps the streams in text format.
+  * `-T` : Tagged mode. Dumps the tagged contents.
+  * `-O output_dir` : Output directory for extracted streams.
+
+## TODO
+
+  * Replace STRICT variable with something better.
+  * Improve the debugging functions.
+  * Use logging module instead of sys.stderr.
+  * Proper test cases.
+  * PEP-8 and PEP-257 conformance.
+  * Better documentation.
+  * Crypto stream filter support.
+
+
+## Related Projects
+
+  * <a href="http://pybrary.net/pyPdf/">pyPdf</a>
+  * <a href="http://www.foolabs.com/xpdf/">xpdf</a>
+  * <a href="http://pdfbox.apache.org/">pdfbox</a>
+  * <a href="http://mupdf.com/">mupdf</a>
+
+%package -n python3-pdfminer
+Summary:	PDF parser and analyzer
+Provides:	python-pdfminer
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-pdfminer
+# PDFMiner
+
+PDFMiner is a text extraction tool for PDF documents.
+
+[![Build Status](https://travis-ci.org/euske/pdfminer.svg?branch=master)](https://travis-ci.org/euske/pdfminer)
+[![PyPI](https://img.shields.io/pypi/v/pdfminer)](https://pypi.org/project/pdfminer/)
+
+**Warning**: Starting from version 20191010, PDFMiner supports **Python 3 only**.
+For Python 2 support, check out
+<a href="https://github.com/pdfminer/pdfminer.six">pdfminer.six</a>.
+
+## Features:
+
+  * Pure Python (3.6 or above).
+  * Supports PDF-1.7. (well, almost)
+  * Obtains the exact location of text as well as other layout information (fonts, etc.).
+  * Performs automatic layout analysis.
+  * Can convert PDF into other formats (HTML/XML).
+  * Can extract an outline (TOC).
+  * Can extract tagged contents.
+  * Supports basic encryption (RC4 and AES).
+  * Supports various font types (Type1, TrueType, Type3, and CID).
+  * Supports CJK languages and vertical writing scripts.
+  * Has an extensible PDF parser that can be used for other purposes.
+
+
+## How to Use:
+
+  1. `> pip install pdfminer`
+  1. `> pdf2txt.py samples/simple1.pdf`
+
+
+## Command Line Syntax:
+
+### pdf2txt.py
+
+pdf2txt.py extracts all the texts that are rendered programmatically.
+It also extracts the corresponding locations, font names, font sizes,
+writing direction (horizontal or vertical) for each text segment.  It
+does not recognize text in images. A password needs to be provided for
+restricted PDF documents.
+
+    > pdf2txt.py [-P password] [-o output] [-t text|html|xml|tag]
+                 [-O output_dir] [-c encoding] [-s scale] [-R rotation]
+                 [-Y normal|loose|exact] [-p pagenos] [-m maxpages]
+                 [-S] [-C] [-n] [-A] [-V]
+                 [-M char_margin] [-L line_margin] [-W word_margin]
+                 [-F boxes_flow] [-d]
+                 input.pdf ...
+
+  * `-P password` : PDF password.
+  * `-o output` : Output file name.
+  * `-t text|html|xml|tag` : Output type. (default: automatically inferred from the output file name.)
+  * `-O output_dir` : Output directory for extracted images.
+  * `-c encoding` : Output encoding. (default: utf-8)
+  * `-s scale` : Output scale.
+  * `-R rotation` : Rotates the page in degree.
+  * `-Y normal|loose|exact` : Specifies the layout mode. (only for HTML output.)
+  * `-p pagenos` : Processes certain pages only.
+  * `-m maxpages` : Limits the number of maximum pages to process.
+  * `-S` : Strips control characters.
+  * `-C` : Disables resource caching.
+  * `-n` : Disables layout analysis.
+  * `-A` : Applies layout analysis for all texts including figures.
+  * `-V` : Automatically detects vertical writing.
+  * `-M char_margin` : Speficies the char margin.
+  * `-W word_margin` : Speficies the word margin.
+  * `-L line_margin` : Speficies the line margin.
+  * `-F boxes_flow` : Speficies the box flow ratio.
+  * `-d` : Turns on Debug output.
+
+### dumppdf.py
+
+dumppdf.py is used for debugging PDFs.
+It dumps all the internal contents in pseudo-XML format.
+
+    > dumppdf.py [-P password] [-a] [-p pageid] [-i objid]
+                 [-o output] [-r|-b|-t] [-T] [-O directory] [-d]
+                 input.pdf ...
+
+  * `-P password` : PDF password.
+  * `-a` : Extracts all objects.
+  * `-p pageid` : Extracts a Page object.
+  * `-i objid` : Extracts a certain object.
+  * `-o output` : Output file name.
+  * `-r` : Raw mode. Dumps the raw compressed/encoded streams.
+  * `-b` : Binary mode. Dumps the uncompressed/decoded streams.
+  * `-t` : Text mode. Dumps the streams in text format.
+  * `-T` : Tagged mode. Dumps the tagged contents.
+  * `-O output_dir` : Output directory for extracted streams.
+
+## TODO
+
+  * Replace STRICT variable with something better.
+  * Improve the debugging functions.
+  * Use logging module instead of sys.stderr.
+  * Proper test cases.
+  * PEP-8 and PEP-257 conformance.
+  * Better documentation.
+  * Crypto stream filter support.
+
+
+## Related Projects
+
+  * <a href="http://pybrary.net/pyPdf/">pyPdf</a>
+  * <a href="http://www.foolabs.com/xpdf/">xpdf</a>
+  * <a href="http://pdfbox.apache.org/">pdfbox</a>
+  * <a href="http://mupdf.com/">mupdf</a>
+
+%package help
+Summary:	Development documents and examples for pdfminer
+Provides:	python3-pdfminer-doc
+%description help
+# PDFMiner
+
+PDFMiner is a text extraction tool for PDF documents.
+
+[![Build Status](https://travis-ci.org/euske/pdfminer.svg?branch=master)](https://travis-ci.org/euske/pdfminer)
+[![PyPI](https://img.shields.io/pypi/v/pdfminer)](https://pypi.org/project/pdfminer/)
+
+**Warning**: Starting from version 20191010, PDFMiner supports **Python 3 only**.
+For Python 2 support, check out
+<a href="https://github.com/pdfminer/pdfminer.six">pdfminer.six</a>.
+
+## Features:
+
+  * Pure Python (3.6 or above).
+  * Supports PDF-1.7. (well, almost)
+  * Obtains the exact location of text as well as other layout information (fonts, etc.).
+  * Performs automatic layout analysis.
+  * Can convert PDF into other formats (HTML/XML).
+  * Can extract an outline (TOC).
+  * Can extract tagged contents.
+  * Supports basic encryption (RC4 and AES).
+  * Supports various font types (Type1, TrueType, Type3, and CID).
+  * Supports CJK languages and vertical writing scripts.
+  * Has an extensible PDF parser that can be used for other purposes.
+
+
+## How to Use:
+
+  1. `> pip install pdfminer`
+  1. `> pdf2txt.py samples/simple1.pdf`
+
+
+## Command Line Syntax:
+
+### pdf2txt.py
+
+pdf2txt.py extracts all the texts that are rendered programmatically.
+It also extracts the corresponding locations, font names, font sizes,
+writing direction (horizontal or vertical) for each text segment.  It
+does not recognize text in images. A password needs to be provided for
+restricted PDF documents.
+
+    > pdf2txt.py [-P password] [-o output] [-t text|html|xml|tag]
+                 [-O output_dir] [-c encoding] [-s scale] [-R rotation]
+                 [-Y normal|loose|exact] [-p pagenos] [-m maxpages]
+                 [-S] [-C] [-n] [-A] [-V]
+                 [-M char_margin] [-L line_margin] [-W word_margin]
+                 [-F boxes_flow] [-d]
+                 input.pdf ...
+
+  * `-P password` : PDF password.
+  * `-o output` : Output file name.
+  * `-t text|html|xml|tag` : Output type. (default: automatically inferred from the output file name.)
+  * `-O output_dir` : Output directory for extracted images.
+  * `-c encoding` : Output encoding. (default: utf-8)
+  * `-s scale` : Output scale.
+  * `-R rotation` : Rotates the page in degree.
+  * `-Y normal|loose|exact` : Specifies the layout mode. (only for HTML output.)
+  * `-p pagenos` : Processes certain pages only.
+  * `-m maxpages` : Limits the number of maximum pages to process.
+  * `-S` : Strips control characters.
+  * `-C` : Disables resource caching.
+  * `-n` : Disables layout analysis.
+  * `-A` : Applies layout analysis for all texts including figures.
+  * `-V` : Automatically detects vertical writing.
+  * `-M char_margin` : Speficies the char margin.
+  * `-W word_margin` : Speficies the word margin.
+  * `-L line_margin` : Speficies the line margin.
+  * `-F boxes_flow` : Speficies the box flow ratio.
+  * `-d` : Turns on Debug output.
+
+### dumppdf.py
+
+dumppdf.py is used for debugging PDFs.
+It dumps all the internal contents in pseudo-XML format.
+
+    > dumppdf.py [-P password] [-a] [-p pageid] [-i objid]
+                 [-o output] [-r|-b|-t] [-T] [-O directory] [-d]
+                 input.pdf ...
+
+  * `-P password` : PDF password.
+  * `-a` : Extracts all objects.
+  * `-p pageid` : Extracts a Page object.
+  * `-i objid` : Extracts a certain object.
+  * `-o output` : Output file name.
+  * `-r` : Raw mode. Dumps the raw compressed/encoded streams.
+  * `-b` : Binary mode. Dumps the uncompressed/decoded streams.
+  * `-t` : Text mode. Dumps the streams in text format.
+  * `-T` : Tagged mode. Dumps the tagged contents.
+  * `-O output_dir` : Output directory for extracted streams.
+
+## TODO
+
+  * Replace STRICT variable with something better.
+  * Improve the debugging functions.
+  * Use logging module instead of sys.stderr.
+  * Proper test cases.
+  * PEP-8 and PEP-257 conformance.
+  * Better documentation.
+  * Crypto stream filter support.
+
+
+## Related Projects
+
+  * <a href="http://pybrary.net/pyPdf/">pyPdf</a>
+  * <a href="http://www.foolabs.com/xpdf/">xpdf</a>
+  * <a href="http://pdfbox.apache.org/">pdfbox</a>
+  * <a href="http://mupdf.com/">mupdf</a>
+
+%prep
+%autosetup -n pdfminer-20191125
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-pdfminer -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Thu Mar 09 2023 Python_Bot <Python_Bot@openeuler.org> - 20191125-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..c5dffcd
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+822eb51838a944154027b8ca42d439e3  pdfminer-20191125.tar.gz
author	CoprDistGit <infra@openeuler.org>	2023-03-09 14:56:19 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-03-09 14:56:19 +0000
commit	72e2236f1b973972942c30ffd7b2848322ae97e4 (patch)
tree	6b0d9b020f1e772855c4a502616e68898123f9af
parent	399066ff07f558388dc59169329211eef5c830ec (diff)