summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-03-09 14:56:19 +0000
committerCoprDistGit <infra@openeuler.org>2023-03-09 14:56:19 +0000
commit72e2236f1b973972942c30ffd7b2848322ae97e4 (patch)
tree6b0d9b020f1e772855c4a502616e68898123f9af
parent399066ff07f558388dc59169329211eef5c830ec (diff)
automatic import of python-pdfminer
-rw-r--r--.gitignore1
-rw-r--r--python-pdfminer.spec393
-rw-r--r--sources1
3 files changed, 395 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..4267a5c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/pdfminer-20191125.tar.gz
diff --git a/python-pdfminer.spec b/python-pdfminer.spec
new file mode 100644
index 0000000..69b1887
--- /dev/null
+++ b/python-pdfminer.spec
@@ -0,0 +1,393 @@
+%global _empty_manifest_terminate_build 0
+Name: python-pdfminer
+Version: 20191125
+Release: 1
+Summary: PDF parser and analyzer
+License: MIT
+URL: http://github.com/euske/pdfminer
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/71/a3/155c5cde5f9c0b1069043b2946a93f54a41fd72cc19c6c100f6f2f5bdc15/pdfminer-20191125.tar.gz
+BuildArch: noarch
+
+
+%description
+# PDFMiner
+
+PDFMiner is a text extraction tool for PDF documents.
+
+[![Build Status](https://travis-ci.org/euske/pdfminer.svg?branch=master)](https://travis-ci.org/euske/pdfminer)
+[![PyPI](https://img.shields.io/pypi/v/pdfminer)](https://pypi.org/project/pdfminer/)
+
+**Warning**: Starting from version 20191010, PDFMiner supports **Python 3 only**.
+For Python 2 support, check out
+<a href="https://github.com/pdfminer/pdfminer.six">pdfminer.six</a>.
+
+## Features:
+
+ * Pure Python (3.6 or above).
+ * Supports PDF-1.7. (well, almost)
+ * Obtains the exact location of text as well as other layout information (fonts, etc.).
+ * Performs automatic layout analysis.
+ * Can convert PDF into other formats (HTML/XML).
+ * Can extract an outline (TOC).
+ * Can extract tagged contents.
+ * Supports basic encryption (RC4 and AES).
+ * Supports various font types (Type1, TrueType, Type3, and CID).
+ * Supports CJK languages and vertical writing scripts.
+ * Has an extensible PDF parser that can be used for other purposes.
+
+
+## How to Use:
+
+ 1. `> pip install pdfminer`
+ 1. `> pdf2txt.py samples/simple1.pdf`
+
+
+## Command Line Syntax:
+
+### pdf2txt.py
+
+pdf2txt.py extracts all the texts that are rendered programmatically.
+It also extracts the corresponding locations, font names, font sizes,
+writing direction (horizontal or vertical) for each text segment. It
+does not recognize text in images. A password needs to be provided for
+restricted PDF documents.
+
+ > pdf2txt.py [-P password] [-o output] [-t text|html|xml|tag]
+ [-O output_dir] [-c encoding] [-s scale] [-R rotation]
+ [-Y normal|loose|exact] [-p pagenos] [-m maxpages]
+ [-S] [-C] [-n] [-A] [-V]
+ [-M char_margin] [-L line_margin] [-W word_margin]
+ [-F boxes_flow] [-d]
+ input.pdf ...
+
+ * `-P password` : PDF password.
+ * `-o output` : Output file name.
+ * `-t text|html|xml|tag` : Output type. (default: automatically inferred from the output file name.)
+ * `-O output_dir` : Output directory for extracted images.
+ * `-c encoding` : Output encoding. (default: utf-8)
+ * `-s scale` : Output scale.
+ * `-R rotation` : Rotates the page in degree.
+ * `-Y normal|loose|exact` : Specifies the layout mode. (only for HTML output.)
+ * `-p pagenos` : Processes certain pages only.
+ * `-m maxpages` : Limits the number of maximum pages to process.
+ * `-S` : Strips control characters.
+ * `-C` : Disables resource caching.
+ * `-n` : Disables layout analysis.
+ * `-A` : Applies layout analysis for all texts including figures.
+ * `-V` : Automatically detects vertical writing.
+ * `-M char_margin` : Speficies the char margin.
+ * `-W word_margin` : Speficies the word margin.
+ * `-L line_margin` : Speficies the line margin.
+ * `-F boxes_flow` : Speficies the box flow ratio.
+ * `-d` : Turns on Debug output.
+
+### dumppdf.py
+
+dumppdf.py is used for debugging PDFs.
+It dumps all the internal contents in pseudo-XML format.
+
+ > dumppdf.py [-P password] [-a] [-p pageid] [-i objid]
+ [-o output] [-r|-b|-t] [-T] [-O directory] [-d]
+ input.pdf ...
+
+ * `-P password` : PDF password.
+ * `-a` : Extracts all objects.
+ * `-p pageid` : Extracts a Page object.
+ * `-i objid` : Extracts a certain object.
+ * `-o output` : Output file name.
+ * `-r` : Raw mode. Dumps the raw compressed/encoded streams.
+ * `-b` : Binary mode. Dumps the uncompressed/decoded streams.
+ * `-t` : Text mode. Dumps the streams in text format.
+ * `-T` : Tagged mode. Dumps the tagged contents.
+ * `-O output_dir` : Output directory for extracted streams.
+
+## TODO
+
+ * Replace STRICT variable with something better.
+ * Improve the debugging functions.
+ * Use logging module instead of sys.stderr.
+ * Proper test cases.
+ * PEP-8 and PEP-257 conformance.
+ * Better documentation.
+ * Crypto stream filter support.
+
+
+## Related Projects
+
+ * <a href="http://pybrary.net/pyPdf/">pyPdf</a>
+ * <a href="http://www.foolabs.com/xpdf/">xpdf</a>
+ * <a href="http://pdfbox.apache.org/">pdfbox</a>
+ * <a href="http://mupdf.com/">mupdf</a>
+
+%package -n python3-pdfminer
+Summary: PDF parser and analyzer
+Provides: python-pdfminer
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-pdfminer
+# PDFMiner
+
+PDFMiner is a text extraction tool for PDF documents.
+
+[![Build Status](https://travis-ci.org/euske/pdfminer.svg?branch=master)](https://travis-ci.org/euske/pdfminer)
+[![PyPI](https://img.shields.io/pypi/v/pdfminer)](https://pypi.org/project/pdfminer/)
+
+**Warning**: Starting from version 20191010, PDFMiner supports **Python 3 only**.
+For Python 2 support, check out
+<a href="https://github.com/pdfminer/pdfminer.six">pdfminer.six</a>.
+
+## Features:
+
+ * Pure Python (3.6 or above).
+ * Supports PDF-1.7. (well, almost)
+ * Obtains the exact location of text as well as other layout information (fonts, etc.).
+ * Performs automatic layout analysis.
+ * Can convert PDF into other formats (HTML/XML).
+ * Can extract an outline (TOC).
+ * Can extract tagged contents.
+ * Supports basic encryption (RC4 and AES).
+ * Supports various font types (Type1, TrueType, Type3, and CID).
+ * Supports CJK languages and vertical writing scripts.
+ * Has an extensible PDF parser that can be used for other purposes.
+
+
+## How to Use:
+
+ 1. `> pip install pdfminer`
+ 1. `> pdf2txt.py samples/simple1.pdf`
+
+
+## Command Line Syntax:
+
+### pdf2txt.py
+
+pdf2txt.py extracts all the texts that are rendered programmatically.
+It also extracts the corresponding locations, font names, font sizes,
+writing direction (horizontal or vertical) for each text segment. It
+does not recognize text in images. A password needs to be provided for
+restricted PDF documents.
+
+ > pdf2txt.py [-P password] [-o output] [-t text|html|xml|tag]
+ [-O output_dir] [-c encoding] [-s scale] [-R rotation]
+ [-Y normal|loose|exact] [-p pagenos] [-m maxpages]
+ [-S] [-C] [-n] [-A] [-V]
+ [-M char_margin] [-L line_margin] [-W word_margin]
+ [-F boxes_flow] [-d]
+ input.pdf ...
+
+ * `-P password` : PDF password.
+ * `-o output` : Output file name.
+ * `-t text|html|xml|tag` : Output type. (default: automatically inferred from the output file name.)
+ * `-O output_dir` : Output directory for extracted images.
+ * `-c encoding` : Output encoding. (default: utf-8)
+ * `-s scale` : Output scale.
+ * `-R rotation` : Rotates the page in degree.
+ * `-Y normal|loose|exact` : Specifies the layout mode. (only for HTML output.)
+ * `-p pagenos` : Processes certain pages only.
+ * `-m maxpages` : Limits the number of maximum pages to process.
+ * `-S` : Strips control characters.
+ * `-C` : Disables resource caching.
+ * `-n` : Disables layout analysis.
+ * `-A` : Applies layout analysis for all texts including figures.
+ * `-V` : Automatically detects vertical writing.
+ * `-M char_margin` : Speficies the char margin.
+ * `-W word_margin` : Speficies the word margin.
+ * `-L line_margin` : Speficies the line margin.
+ * `-F boxes_flow` : Speficies the box flow ratio.
+ * `-d` : Turns on Debug output.
+
+### dumppdf.py
+
+dumppdf.py is used for debugging PDFs.
+It dumps all the internal contents in pseudo-XML format.
+
+ > dumppdf.py [-P password] [-a] [-p pageid] [-i objid]
+ [-o output] [-r|-b|-t] [-T] [-O directory] [-d]
+ input.pdf ...
+
+ * `-P password` : PDF password.
+ * `-a` : Extracts all objects.
+ * `-p pageid` : Extracts a Page object.
+ * `-i objid` : Extracts a certain object.
+ * `-o output` : Output file name.
+ * `-r` : Raw mode. Dumps the raw compressed/encoded streams.
+ * `-b` : Binary mode. Dumps the uncompressed/decoded streams.
+ * `-t` : Text mode. Dumps the streams in text format.
+ * `-T` : Tagged mode. Dumps the tagged contents.
+ * `-O output_dir` : Output directory for extracted streams.
+
+## TODO
+
+ * Replace STRICT variable with something better.
+ * Improve the debugging functions.
+ * Use logging module instead of sys.stderr.
+ * Proper test cases.
+ * PEP-8 and PEP-257 conformance.
+ * Better documentation.
+ * Crypto stream filter support.
+
+
+## Related Projects
+
+ * <a href="http://pybrary.net/pyPdf/">pyPdf</a>
+ * <a href="http://www.foolabs.com/xpdf/">xpdf</a>
+ * <a href="http://pdfbox.apache.org/">pdfbox</a>
+ * <a href="http://mupdf.com/">mupdf</a>
+
+%package help
+Summary: Development documents and examples for pdfminer
+Provides: python3-pdfminer-doc
+%description help
+# PDFMiner
+
+PDFMiner is a text extraction tool for PDF documents.
+
+[![Build Status](https://travis-ci.org/euske/pdfminer.svg?branch=master)](https://travis-ci.org/euske/pdfminer)
+[![PyPI](https://img.shields.io/pypi/v/pdfminer)](https://pypi.org/project/pdfminer/)
+
+**Warning**: Starting from version 20191010, PDFMiner supports **Python 3 only**.
+For Python 2 support, check out
+<a href="https://github.com/pdfminer/pdfminer.six">pdfminer.six</a>.
+
+## Features:
+
+ * Pure Python (3.6 or above).
+ * Supports PDF-1.7. (well, almost)
+ * Obtains the exact location of text as well as other layout information (fonts, etc.).
+ * Performs automatic layout analysis.
+ * Can convert PDF into other formats (HTML/XML).
+ * Can extract an outline (TOC).
+ * Can extract tagged contents.
+ * Supports basic encryption (RC4 and AES).
+ * Supports various font types (Type1, TrueType, Type3, and CID).
+ * Supports CJK languages and vertical writing scripts.
+ * Has an extensible PDF parser that can be used for other purposes.
+
+
+## How to Use:
+
+ 1. `> pip install pdfminer`
+ 1. `> pdf2txt.py samples/simple1.pdf`
+
+
+## Command Line Syntax:
+
+### pdf2txt.py
+
+pdf2txt.py extracts all the texts that are rendered programmatically.
+It also extracts the corresponding locations, font names, font sizes,
+writing direction (horizontal or vertical) for each text segment. It
+does not recognize text in images. A password needs to be provided for
+restricted PDF documents.
+
+ > pdf2txt.py [-P password] [-o output] [-t text|html|xml|tag]
+ [-O output_dir] [-c encoding] [-s scale] [-R rotation]
+ [-Y normal|loose|exact] [-p pagenos] [-m maxpages]
+ [-S] [-C] [-n] [-A] [-V]
+ [-M char_margin] [-L line_margin] [-W word_margin]
+ [-F boxes_flow] [-d]
+ input.pdf ...
+
+ * `-P password` : PDF password.
+ * `-o output` : Output file name.
+ * `-t text|html|xml|tag` : Output type. (default: automatically inferred from the output file name.)
+ * `-O output_dir` : Output directory for extracted images.
+ * `-c encoding` : Output encoding. (default: utf-8)
+ * `-s scale` : Output scale.
+ * `-R rotation` : Rotates the page in degree.
+ * `-Y normal|loose|exact` : Specifies the layout mode. (only for HTML output.)
+ * `-p pagenos` : Processes certain pages only.
+ * `-m maxpages` : Limits the number of maximum pages to process.
+ * `-S` : Strips control characters.
+ * `-C` : Disables resource caching.
+ * `-n` : Disables layout analysis.
+ * `-A` : Applies layout analysis for all texts including figures.
+ * `-V` : Automatically detects vertical writing.
+ * `-M char_margin` : Speficies the char margin.
+ * `-W word_margin` : Speficies the word margin.
+ * `-L line_margin` : Speficies the line margin.
+ * `-F boxes_flow` : Speficies the box flow ratio.
+ * `-d` : Turns on Debug output.
+
+### dumppdf.py
+
+dumppdf.py is used for debugging PDFs.
+It dumps all the internal contents in pseudo-XML format.
+
+ > dumppdf.py [-P password] [-a] [-p pageid] [-i objid]
+ [-o output] [-r|-b|-t] [-T] [-O directory] [-d]
+ input.pdf ...
+
+ * `-P password` : PDF password.
+ * `-a` : Extracts all objects.
+ * `-p pageid` : Extracts a Page object.
+ * `-i objid` : Extracts a certain object.
+ * `-o output` : Output file name.
+ * `-r` : Raw mode. Dumps the raw compressed/encoded streams.
+ * `-b` : Binary mode. Dumps the uncompressed/decoded streams.
+ * `-t` : Text mode. Dumps the streams in text format.
+ * `-T` : Tagged mode. Dumps the tagged contents.
+ * `-O output_dir` : Output directory for extracted streams.
+
+## TODO
+
+ * Replace STRICT variable with something better.
+ * Improve the debugging functions.
+ * Use logging module instead of sys.stderr.
+ * Proper test cases.
+ * PEP-8 and PEP-257 conformance.
+ * Better documentation.
+ * Crypto stream filter support.
+
+
+## Related Projects
+
+ * <a href="http://pybrary.net/pyPdf/">pyPdf</a>
+ * <a href="http://www.foolabs.com/xpdf/">xpdf</a>
+ * <a href="http://pdfbox.apache.org/">pdfbox</a>
+ * <a href="http://mupdf.com/">mupdf</a>
+
+%prep
+%autosetup -n pdfminer-20191125
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-pdfminer -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Thu Mar 09 2023 Python_Bot <Python_Bot@openeuler.org> - 20191125-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..c5dffcd
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+822eb51838a944154027b8ca42d439e3 pdfminer-20191125.tar.gz