automatic import of python-pdf2image

author: CoprDistGit <infra@openeuler.org> 2023-04-10 11:59:05 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-04-10 11:59:05 +0000
commit: d13154842b2e864636c34d1f83b4868c029271c0 (patch)
tree: 87c97bfd2f5d156b4f7480d12172cb8207419794
parent: fb90ce4219010182d678d21faddc94026d5e1aa8 (diff)
3 files changed, 387 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..d303c71 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/pdf2image-1.16.3.tar.gz
diff --git a/python-pdf2image.spec b/python-pdf2image.spec
new file mode 100644
index 0000000..c4e188c
--- /dev/null
+++ b/python-pdf2image.spec
@@ -0,0 +1,385 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-pdf2image
+Version:	1.16.3
+Release:	1
+Summary:	A wrapper around the pdftoppm and pdftocairo command line tools to convert PDF to a PIL Image list.
+License:	MIT
+URL:		https://github.com/Belval/pdf2image
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/b3/96/ad8331752aab6af890151efeee9d35a47c019502a51be5ff3b4791083695/pdf2image-1.16.3.tar.gz
+BuildArch:	noarch
+
+Requires:	python3-pillow
+
+%description
+# pdf2image
+[![CircleCI](https://circleci.com/gh/Belval/pdf2image/tree/master.svg?style=svg)](https://circleci.com/gh/Belval/pdf2image/tree/master) [![PyPI version](https://badge.fury.io/py/pdf2image.svg)](https://badge.fury.io/py/pdf2image) [![codecov](https://codecov.io/gh/Belval/pdf2image/branch/master/graph/badge.svg)](https://codecov.io/gh/Belval/pdf2image) [![Downloads](https://pepy.tech/badge/pdf2image/month)](https://pepy.tech/project/pdf2image) [![GitHub CI](https://github.com/Belval/pdf2image/actions/workflows/documentation.yml/badge.svg)](https://belval.github.io/pdf2image)
+
+A python (3.7+) module that wraps pdftoppm and pdftocairo to convert PDF to a PIL Image object
+
+## How to install
+
+`pip install pdf2image`
+
+### Windows
+
+Windows users will have to build or download poppler for Windows. I recommend [@oschwartz10612 version](https://github.com/oschwartz10612/poppler-windows/releases/) which is the most up-to-date. You will then have to add the `bin/` folder to [PATH](https://www.architectryan.com/2018/03/17/add-to-the-path-on-windows-10/) or use `poppler_path = r"C:\path\to\poppler-xx\bin" as an argument` in `convert_from_path`.
+
+### Mac
+
+Mac users will have to install [poppler](https://poppler.freedesktop.org/).
+
+Installing using [Brew](https://brew.sh/):
+
+```
+brew install poppler
+```
+
+### Linux
+
+Most distros ship with `pdftoppm` and `pdftocairo`. If they are not installed, refer to your package manager to install `poppler-utils`
+
+### Platform-independant (Using `conda`)
+
+1. Install poppler: `conda install -c conda-forge poppler`
+2. Install pdf2image: `pip install pdf2image`
+
+## How does it work?
+
+`from pdf2image import convert_from_path, convert_from_bytes`
+
+```py
+from pdf2image.exceptions import (
+    PDFInfoNotInstalledError,
+    PDFPageCountError,
+    PDFSyntaxError
+)
+```
+
+Then simply do:
+
+```py
+images = convert_from_path('/home/belval/example.pdf')
+```
+
+OR
+
+```py
+images = convert_from_bytes(open('/home/belval/example.pdf', 'rb').read())
+```
+
+OR better yet
+
+```py
+import tempfile
+
+with tempfile.TemporaryDirectory() as path:
+    images_from_path = convert_from_path('/home/belval/example.pdf', output_folder=path)
+    # Do something here
+```
+
+`images` will be a list of PIL Image representing each page of the PDF document.
+
+Here are the definitions:
+
+`convert_from_path(pdf_path, dpi=200, output_folder=None, first_page=None, last_page=None, fmt='ppm', jpegopt=None, thread_count=1, userpw=None, use_cropbox=False, strict=False, transparent=False, single_file=False, output_file=str(uuid.uuid4()), poppler_path=None, grayscale=False, size=None, paths_only=False, use_pdftocairo=False, timeout=600, hide_attributes=False)`
+
+`convert_from_bytes(pdf_file, dpi=200, output_folder=None, first_page=None, last_page=None, fmt='ppm', jpegopt=None, thread_count=1, userpw=None, use_cropbox=False, strict=False, transparent=False, single_file=False, output_file=str(uuid.uuid4()), poppler_path=None, grayscale=False, size=None, paths_only=False, use_pdftocairo=False, timeout=600, hide_attributes=False)`
+
+## What's new?
+
+- Allow users to hide attributes when using pdftoppm with `hide_attributes` (Thank you @StaticRocket)
+- Fix console opening on Windows (Thank you @OhMyAgnes!)
+- Add `timeout` parameter which raises `PDFPopplerTimeoutError` after the given number of seconds.
+- Add `use_pdftocairo` parameter which forces `pdf2image` to use `pdftocairo`. Should improve performance.
+- Fixed a bug where using `pdf2image` with multiple threads (but not multiple processes) would cause and exception
+- `jpegopt` parameter allows for tuning of the output JPEG when using `fmt="jpeg"` (`-jpegopt` in pdftoppm CLI) (Thank you @abieler)
+- `pdfinfo_from_path` and `pdfinfo_from_bytes` which expose the output of the pdfinfo CLI
+- `paths_only` parameter will return image paths instead of Image objects, to prevent OOM when converting a big PDF
+- `size` parameter allows you to define the shape of the resulting images (`-scale-to` in pdftoppm CLI)
+    - `size=400` will fit the image to a 400x400 box, preserving aspect ratio
+    - `size=(400, None)` will make the image 400 pixels wide, preserving aspect ratio
+    - `size=(500, 500)` will resize the image to 500x500 pixels, not preserving aspect ratio
+- `grayscale` parameter allows you to convert images to grayscale (`-gray` in pdftoppm CLI)
+- `single_file` parameter allows you to convert the first PDF page only, without adding digits at the end of the `output_file`
+- Allow the user to specify poppler's installation path with `poppler_path`
+
+## Performance tips
+
+- Using an output folder is significantly faster if you are using an SSD. Otherwise i/o usually becomes the bottleneck.
+- Using multiple threads can give you some gains but avoid more than 4 as this will cause i/o bottleneck (even on my NVMe SSD!).
+- If i/o is your bottleneck, using the JPEG format can lead to significant gains.
+- PNG format is pretty slow, this is because of the compression.
+- If you want to know the best settings (most settings will be fine anyway) you can clone the project and run `python tests.py` to get timings.
+
+## Limitations / known issues
+
+- A relatively big PDF will use up all your memory and cause the process to be killed (unless you use an output folder)
+- Sometimes fail read pdf signed using DocuSign, [Solution for DocuSign issue.](docs/installation.md)
+
+
+%package -n python3-pdf2image
+Summary:	A wrapper around the pdftoppm and pdftocairo command line tools to convert PDF to a PIL Image list.
+Provides:	python-pdf2image
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-pdf2image
+# pdf2image
+[![CircleCI](https://circleci.com/gh/Belval/pdf2image/tree/master.svg?style=svg)](https://circleci.com/gh/Belval/pdf2image/tree/master) [![PyPI version](https://badge.fury.io/py/pdf2image.svg)](https://badge.fury.io/py/pdf2image) [![codecov](https://codecov.io/gh/Belval/pdf2image/branch/master/graph/badge.svg)](https://codecov.io/gh/Belval/pdf2image) [![Downloads](https://pepy.tech/badge/pdf2image/month)](https://pepy.tech/project/pdf2image) [![GitHub CI](https://github.com/Belval/pdf2image/actions/workflows/documentation.yml/badge.svg)](https://belval.github.io/pdf2image)
+
+A python (3.7+) module that wraps pdftoppm and pdftocairo to convert PDF to a PIL Image object
+
+## How to install
+
+`pip install pdf2image`
+
+### Windows
+
+Windows users will have to build or download poppler for Windows. I recommend [@oschwartz10612 version](https://github.com/oschwartz10612/poppler-windows/releases/) which is the most up-to-date. You will then have to add the `bin/` folder to [PATH](https://www.architectryan.com/2018/03/17/add-to-the-path-on-windows-10/) or use `poppler_path = r"C:\path\to\poppler-xx\bin" as an argument` in `convert_from_path`.
+
+### Mac
+
+Mac users will have to install [poppler](https://poppler.freedesktop.org/).
+
+Installing using [Brew](https://brew.sh/):
+
+```
+brew install poppler
+```
+
+### Linux
+
+Most distros ship with `pdftoppm` and `pdftocairo`. If they are not installed, refer to your package manager to install `poppler-utils`
+
+### Platform-independant (Using `conda`)
+
+1. Install poppler: `conda install -c conda-forge poppler`
+2. Install pdf2image: `pip install pdf2image`
+
+## How does it work?
+
+`from pdf2image import convert_from_path, convert_from_bytes`
+
+```py
+from pdf2image.exceptions import (
+    PDFInfoNotInstalledError,
+    PDFPageCountError,
+    PDFSyntaxError
+)
+```
+
+Then simply do:
+
+```py
+images = convert_from_path('/home/belval/example.pdf')
+```
+
+OR
+
+```py
+images = convert_from_bytes(open('/home/belval/example.pdf', 'rb').read())
+```
+
+OR better yet
+
+```py
+import tempfile
+
+with tempfile.TemporaryDirectory() as path:
+    images_from_path = convert_from_path('/home/belval/example.pdf', output_folder=path)
+    # Do something here
+```
+
+`images` will be a list of PIL Image representing each page of the PDF document.
+
+Here are the definitions:
+
+`convert_from_path(pdf_path, dpi=200, output_folder=None, first_page=None, last_page=None, fmt='ppm', jpegopt=None, thread_count=1, userpw=None, use_cropbox=False, strict=False, transparent=False, single_file=False, output_file=str(uuid.uuid4()), poppler_path=None, grayscale=False, size=None, paths_only=False, use_pdftocairo=False, timeout=600, hide_attributes=False)`
+
+`convert_from_bytes(pdf_file, dpi=200, output_folder=None, first_page=None, last_page=None, fmt='ppm', jpegopt=None, thread_count=1, userpw=None, use_cropbox=False, strict=False, transparent=False, single_file=False, output_file=str(uuid.uuid4()), poppler_path=None, grayscale=False, size=None, paths_only=False, use_pdftocairo=False, timeout=600, hide_attributes=False)`
+
+## What's new?
+
+- Allow users to hide attributes when using pdftoppm with `hide_attributes` (Thank you @StaticRocket)
+- Fix console opening on Windows (Thank you @OhMyAgnes!)
+- Add `timeout` parameter which raises `PDFPopplerTimeoutError` after the given number of seconds.
+- Add `use_pdftocairo` parameter which forces `pdf2image` to use `pdftocairo`. Should improve performance.
+- Fixed a bug where using `pdf2image` with multiple threads (but not multiple processes) would cause and exception
+- `jpegopt` parameter allows for tuning of the output JPEG when using `fmt="jpeg"` (`-jpegopt` in pdftoppm CLI) (Thank you @abieler)
+- `pdfinfo_from_path` and `pdfinfo_from_bytes` which expose the output of the pdfinfo CLI
+- `paths_only` parameter will return image paths instead of Image objects, to prevent OOM when converting a big PDF
+- `size` parameter allows you to define the shape of the resulting images (`-scale-to` in pdftoppm CLI)
+    - `size=400` will fit the image to a 400x400 box, preserving aspect ratio
+    - `size=(400, None)` will make the image 400 pixels wide, preserving aspect ratio
+    - `size=(500, 500)` will resize the image to 500x500 pixels, not preserving aspect ratio
+- `grayscale` parameter allows you to convert images to grayscale (`-gray` in pdftoppm CLI)
+- `single_file` parameter allows you to convert the first PDF page only, without adding digits at the end of the `output_file`
+- Allow the user to specify poppler's installation path with `poppler_path`
+
+## Performance tips
+
+- Using an output folder is significantly faster if you are using an SSD. Otherwise i/o usually becomes the bottleneck.
+- Using multiple threads can give you some gains but avoid more than 4 as this will cause i/o bottleneck (even on my NVMe SSD!).
+- If i/o is your bottleneck, using the JPEG format can lead to significant gains.
+- PNG format is pretty slow, this is because of the compression.
+- If you want to know the best settings (most settings will be fine anyway) you can clone the project and run `python tests.py` to get timings.
+
+## Limitations / known issues
+
+- A relatively big PDF will use up all your memory and cause the process to be killed (unless you use an output folder)
+- Sometimes fail read pdf signed using DocuSign, [Solution for DocuSign issue.](docs/installation.md)
+
+
+%package help
+Summary:	Development documents and examples for pdf2image
+Provides:	python3-pdf2image-doc
+%description help
+# pdf2image
+[![CircleCI](https://circleci.com/gh/Belval/pdf2image/tree/master.svg?style=svg)](https://circleci.com/gh/Belval/pdf2image/tree/master) [![PyPI version](https://badge.fury.io/py/pdf2image.svg)](https://badge.fury.io/py/pdf2image) [![codecov](https://codecov.io/gh/Belval/pdf2image/branch/master/graph/badge.svg)](https://codecov.io/gh/Belval/pdf2image) [![Downloads](https://pepy.tech/badge/pdf2image/month)](https://pepy.tech/project/pdf2image) [![GitHub CI](https://github.com/Belval/pdf2image/actions/workflows/documentation.yml/badge.svg)](https://belval.github.io/pdf2image)
+
+A python (3.7+) module that wraps pdftoppm and pdftocairo to convert PDF to a PIL Image object
+
+## How to install
+
+`pip install pdf2image`
+
+### Windows
+
+Windows users will have to build or download poppler for Windows. I recommend [@oschwartz10612 version](https://github.com/oschwartz10612/poppler-windows/releases/) which is the most up-to-date. You will then have to add the `bin/` folder to [PATH](https://www.architectryan.com/2018/03/17/add-to-the-path-on-windows-10/) or use `poppler_path = r"C:\path\to\poppler-xx\bin" as an argument` in `convert_from_path`.
+
+### Mac
+
+Mac users will have to install [poppler](https://poppler.freedesktop.org/).
+
+Installing using [Brew](https://brew.sh/):
+
+```
+brew install poppler
+```
+
+### Linux
+
+Most distros ship with `pdftoppm` and `pdftocairo`. If they are not installed, refer to your package manager to install `poppler-utils`
+
+### Platform-independant (Using `conda`)
+
+1. Install poppler: `conda install -c conda-forge poppler`
+2. Install pdf2image: `pip install pdf2image`
+
+## How does it work?
+
+`from pdf2image import convert_from_path, convert_from_bytes`
+
+```py
+from pdf2image.exceptions import (
+    PDFInfoNotInstalledError,
+    PDFPageCountError,
+    PDFSyntaxError
+)
+```
+
+Then simply do:
+
+```py
+images = convert_from_path('/home/belval/example.pdf')
+```
+
+OR
+
+```py
+images = convert_from_bytes(open('/home/belval/example.pdf', 'rb').read())
+```
+
+OR better yet
+
+```py
+import tempfile
+
+with tempfile.TemporaryDirectory() as path:
+    images_from_path = convert_from_path('/home/belval/example.pdf', output_folder=path)
+    # Do something here
+```
+
+`images` will be a list of PIL Image representing each page of the PDF document.
+
+Here are the definitions:
+
+`convert_from_path(pdf_path, dpi=200, output_folder=None, first_page=None, last_page=None, fmt='ppm', jpegopt=None, thread_count=1, userpw=None, use_cropbox=False, strict=False, transparent=False, single_file=False, output_file=str(uuid.uuid4()), poppler_path=None, grayscale=False, size=None, paths_only=False, use_pdftocairo=False, timeout=600, hide_attributes=False)`
+
+`convert_from_bytes(pdf_file, dpi=200, output_folder=None, first_page=None, last_page=None, fmt='ppm', jpegopt=None, thread_count=1, userpw=None, use_cropbox=False, strict=False, transparent=False, single_file=False, output_file=str(uuid.uuid4()), poppler_path=None, grayscale=False, size=None, paths_only=False, use_pdftocairo=False, timeout=600, hide_attributes=False)`
+
+## What's new?
+
+- Allow users to hide attributes when using pdftoppm with `hide_attributes` (Thank you @StaticRocket)
+- Fix console opening on Windows (Thank you @OhMyAgnes!)
+- Add `timeout` parameter which raises `PDFPopplerTimeoutError` after the given number of seconds.
+- Add `use_pdftocairo` parameter which forces `pdf2image` to use `pdftocairo`. Should improve performance.
+- Fixed a bug where using `pdf2image` with multiple threads (but not multiple processes) would cause and exception
+- `jpegopt` parameter allows for tuning of the output JPEG when using `fmt="jpeg"` (`-jpegopt` in pdftoppm CLI) (Thank you @abieler)
+- `pdfinfo_from_path` and `pdfinfo_from_bytes` which expose the output of the pdfinfo CLI
+- `paths_only` parameter will return image paths instead of Image objects, to prevent OOM when converting a big PDF
+- `size` parameter allows you to define the shape of the resulting images (`-scale-to` in pdftoppm CLI)
+    - `size=400` will fit the image to a 400x400 box, preserving aspect ratio
+    - `size=(400, None)` will make the image 400 pixels wide, preserving aspect ratio
+    - `size=(500, 500)` will resize the image to 500x500 pixels, not preserving aspect ratio
+- `grayscale` parameter allows you to convert images to grayscale (`-gray` in pdftoppm CLI)
+- `single_file` parameter allows you to convert the first PDF page only, without adding digits at the end of the `output_file`
+- Allow the user to specify poppler's installation path with `poppler_path`
+
+## Performance tips
+
+- Using an output folder is significantly faster if you are using an SSD. Otherwise i/o usually becomes the bottleneck.
+- Using multiple threads can give you some gains but avoid more than 4 as this will cause i/o bottleneck (even on my NVMe SSD!).
+- If i/o is your bottleneck, using the JPEG format can lead to significant gains.
+- PNG format is pretty slow, this is because of the compression.
+- If you want to know the best settings (most settings will be fine anyway) you can clone the project and run `python tests.py` to get timings.
+
+## Limitations / known issues
+
+- A relatively big PDF will use up all your memory and cause the process to be killed (unless you use an output folder)
+- Sometimes fail read pdf signed using DocuSign, [Solution for DocuSign issue.](docs/installation.md)
+
+
+%prep
+%autosetup -n pdf2image-1.16.3
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-pdf2image -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Mon Apr 10 2023 Python_Bot <Python_Bot@openeuler.org> - 1.16.3-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..92af34c
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+2265fe570c2c01f4a5f5a7d16408eec7  pdf2image-1.16.3.tar.gz
author	CoprDistGit <infra@openeuler.org>	2023-04-10 11:59:05 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-04-10 11:59:05 +0000
commit	d13154842b2e864636c34d1f83b4868c029271c0 (patch)
tree	87c97bfd2f5d156b4f7480d12172cb8207419794
parent	fb90ce4219010182d678d21faddc94026d5e1aa8 (diff)