automatic import of python-pdf2docxopeneuler20.03

author: CoprDistGit <infra@openeuler.org> 2023-05-05 07:49:58 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-05-05 07:49:58 +0000
commit: 5dc8a42d1d672ea7f02ad6488815724b10fc1c77 (patch)
tree: 113a7f5613f262f2833235303165962bc2054685
parent: f318233c0429a8539eaf0ff207ffc02720cd9e4b (diff)
3 files changed, 290 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..0a679b6 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/pdf2docx-0.5.6.tar.gz
diff --git a/python-pdf2docx.spec b/python-pdf2docx.spec
new file mode 100644
index 0000000..9f70f48
--- /dev/null
+++ b/python-pdf2docx.spec
@@ -0,0 +1,288 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-pdf2docx
+Version:	0.5.6
+Release:	1
+Summary:	Open source Python library converting pdf to docx.
+License:	GPL v3
+URL:		https://github.com/dothinking/pdf2docx
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/05/8f/e50f748e72b812b6ff1af534f2c618c79ad58f0e50d09369665be36086a7/pdf2docx-0.5.6.tar.gz
+BuildArch:	noarch
+
+Requires:	python3-PyMuPDF
+Requires:	python3-docx
+Requires:	python3-fonttools
+Requires:	python3-numpy
+Requires:	python3-opencv-python
+Requires:	python3-fire
+
+%description
+English | [中文](README_CN.md)
+
+# pdf2docx 
+
+![python-version](https://img.shields.io/badge/python->=3.6-green.svg)
+[![codecov](https://codecov.io/gh/dothinking/pdf2docx/branch/master/graph/badge.svg)](https://codecov.io/gh/dothinking/pdf2docx)
+[![pypi-version](https://img.shields.io/pypi/v/pdf2docx.svg)](https://pypi.python.org/pypi/pdf2docx/)
+![license](https://img.shields.io/pypi/l/pdf2docx.svg)
+![pypi-downloads](https://img.shields.io/pypi/dm/pdf2docx)
+
+- Extract data from PDF with `PyMuPDF`, e.g. text, images and drawings 
+- Parse layout with rule, e.g. sections, paragraphs, images and tables
+- Generate docx with `python-docx`
+
+## Features
+
+- Parse and re-create page layout
+    - page margin
+    - section and column (1 or 2 columns only)
+    - page header and footer [TODO]
+
+- Parse and re-create paragraph
+    - OCR text [TODO]
+    - text in horizontal/vertical direction: from left to right, from bottom to top
+    - font style, e.g. font name, size, weight, italic and color
+    - text format, e.g. highlight, underline, strike-through
+    - list style [TODO]
+    - external hyper link
+    - paragraph horizontal alignment (left/right/center/justify) and vertical spacing
+    
+- Parse and re-create image
+	- in-line image
+    - image in Gray/RGB/CMYK mode
+    - transparent image
+    - floating image, i.e. picture behind text
+
+- Parse and re-create table
+    - border style, e.g. width, color
+    - shading style, i.e. background color
+    - merged cells
+    - vertical direction cell
+    - table with partly hidden borders
+    - nested tables
+
+- Parsing pages with multi-processing
+
+*It can also be used as a tool to extract table contents since both table content and format/style is parsed.*
+
+## Limitations
+
+- Text-based PDF file
+- Left to right language
+- Normal reading direction, no word transformation / rotation
+- Rule-based method can't 100% convert the PDF layout
+
+
+## Documentation
+
+- [Installation](https://dothinking.github.io/pdf2docx/installation.html)
+- [Quickstart](https://dothinking.github.io/pdf2docx/quickstart.html)
+    - [Convert PDF](https://dothinking.github.io/pdf2docx/quickstart.convert.html)
+    - [Extract table](https://dothinking.github.io/pdf2docx/quickstart.table.html)
+    - [Command Line Interface](https://dothinking.github.io/pdf2docx/quickstart.cli.html)
+    - [Graphic User Interface](https://dothinking.github.io/pdf2docx/quickstart.gui.html)
+- [Technical Documentation (In Chinese)](https://dothinking.github.io/pdf2docx/techdoc.html)
+- [API Documentation](https://dothinking.github.io/pdf2docx/modules.html)
+
+## Sample
+
+![sample_compare.png](https://s1.ax1x.com/2020/08/04/aDryx1.png)
+
+
+%package -n python3-pdf2docx
+Summary:	Open source Python library converting pdf to docx.
+Provides:	python-pdf2docx
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-pdf2docx
+English | [中文](README_CN.md)
+
+# pdf2docx 
+
+![python-version](https://img.shields.io/badge/python->=3.6-green.svg)
+[![codecov](https://codecov.io/gh/dothinking/pdf2docx/branch/master/graph/badge.svg)](https://codecov.io/gh/dothinking/pdf2docx)
+[![pypi-version](https://img.shields.io/pypi/v/pdf2docx.svg)](https://pypi.python.org/pypi/pdf2docx/)
+![license](https://img.shields.io/pypi/l/pdf2docx.svg)
+![pypi-downloads](https://img.shields.io/pypi/dm/pdf2docx)
+
+- Extract data from PDF with `PyMuPDF`, e.g. text, images and drawings 
+- Parse layout with rule, e.g. sections, paragraphs, images and tables
+- Generate docx with `python-docx`
+
+## Features
+
+- Parse and re-create page layout
+    - page margin
+    - section and column (1 or 2 columns only)
+    - page header and footer [TODO]
+
+- Parse and re-create paragraph
+    - OCR text [TODO]
+    - text in horizontal/vertical direction: from left to right, from bottom to top
+    - font style, e.g. font name, size, weight, italic and color
+    - text format, e.g. highlight, underline, strike-through
+    - list style [TODO]
+    - external hyper link
+    - paragraph horizontal alignment (left/right/center/justify) and vertical spacing
+    
+- Parse and re-create image
+	- in-line image
+    - image in Gray/RGB/CMYK mode
+    - transparent image
+    - floating image, i.e. picture behind text
+
+- Parse and re-create table
+    - border style, e.g. width, color
+    - shading style, i.e. background color
+    - merged cells
+    - vertical direction cell
+    - table with partly hidden borders
+    - nested tables
+
+- Parsing pages with multi-processing
+
+*It can also be used as a tool to extract table contents since both table content and format/style is parsed.*
+
+## Limitations
+
+- Text-based PDF file
+- Left to right language
+- Normal reading direction, no word transformation / rotation
+- Rule-based method can't 100% convert the PDF layout
+
+
+## Documentation
+
+- [Installation](https://dothinking.github.io/pdf2docx/installation.html)
+- [Quickstart](https://dothinking.github.io/pdf2docx/quickstart.html)
+    - [Convert PDF](https://dothinking.github.io/pdf2docx/quickstart.convert.html)
+    - [Extract table](https://dothinking.github.io/pdf2docx/quickstart.table.html)
+    - [Command Line Interface](https://dothinking.github.io/pdf2docx/quickstart.cli.html)
+    - [Graphic User Interface](https://dothinking.github.io/pdf2docx/quickstart.gui.html)
+- [Technical Documentation (In Chinese)](https://dothinking.github.io/pdf2docx/techdoc.html)
+- [API Documentation](https://dothinking.github.io/pdf2docx/modules.html)
+
+## Sample
+
+![sample_compare.png](https://s1.ax1x.com/2020/08/04/aDryx1.png)
+
+
+%package help
+Summary:	Development documents and examples for pdf2docx
+Provides:	python3-pdf2docx-doc
+%description help
+English | [中文](README_CN.md)
+
+# pdf2docx 
+
+![python-version](https://img.shields.io/badge/python->=3.6-green.svg)
+[![codecov](https://codecov.io/gh/dothinking/pdf2docx/branch/master/graph/badge.svg)](https://codecov.io/gh/dothinking/pdf2docx)
+[![pypi-version](https://img.shields.io/pypi/v/pdf2docx.svg)](https://pypi.python.org/pypi/pdf2docx/)
+![license](https://img.shields.io/pypi/l/pdf2docx.svg)
+![pypi-downloads](https://img.shields.io/pypi/dm/pdf2docx)
+
+- Extract data from PDF with `PyMuPDF`, e.g. text, images and drawings 
+- Parse layout with rule, e.g. sections, paragraphs, images and tables
+- Generate docx with `python-docx`
+
+## Features
+
+- Parse and re-create page layout
+    - page margin
+    - section and column (1 or 2 columns only)
+    - page header and footer [TODO]
+
+- Parse and re-create paragraph
+    - OCR text [TODO]
+    - text in horizontal/vertical direction: from left to right, from bottom to top
+    - font style, e.g. font name, size, weight, italic and color
+    - text format, e.g. highlight, underline, strike-through
+    - list style [TODO]
+    - external hyper link
+    - paragraph horizontal alignment (left/right/center/justify) and vertical spacing
+    
+- Parse and re-create image
+	- in-line image
+    - image in Gray/RGB/CMYK mode
+    - transparent image
+    - floating image, i.e. picture behind text
+
+- Parse and re-create table
+    - border style, e.g. width, color
+    - shading style, i.e. background color
+    - merged cells
+    - vertical direction cell
+    - table with partly hidden borders
+    - nested tables
+
+- Parsing pages with multi-processing
+
+*It can also be used as a tool to extract table contents since both table content and format/style is parsed.*
+
+## Limitations
+
+- Text-based PDF file
+- Left to right language
+- Normal reading direction, no word transformation / rotation
+- Rule-based method can't 100% convert the PDF layout
+
+
+## Documentation
+
+- [Installation](https://dothinking.github.io/pdf2docx/installation.html)
+- [Quickstart](https://dothinking.github.io/pdf2docx/quickstart.html)
+    - [Convert PDF](https://dothinking.github.io/pdf2docx/quickstart.convert.html)
+    - [Extract table](https://dothinking.github.io/pdf2docx/quickstart.table.html)
+    - [Command Line Interface](https://dothinking.github.io/pdf2docx/quickstart.cli.html)
+    - [Graphic User Interface](https://dothinking.github.io/pdf2docx/quickstart.gui.html)
+- [Technical Documentation (In Chinese)](https://dothinking.github.io/pdf2docx/techdoc.html)
+- [API Documentation](https://dothinking.github.io/pdf2docx/modules.html)
+
+## Sample
+
+![sample_compare.png](https://s1.ax1x.com/2020/08/04/aDryx1.png)
+
+
+%prep
+%autosetup -n pdf2docx-0.5.6
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-pdf2docx -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 0.5.6-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..73a179a
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+04e43f5cfa75449293a17902f3db9c16  pdf2docx-0.5.6.tar.gz
author	CoprDistGit <infra@openeuler.org>	2023-05-05 07:49:58 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-05-05 07:49:58 +0000
commit	5dc8a42d1d672ea7f02ad6488815724b10fc1c77 (patch)
tree	113a7f5613f262f2833235303165962bc2054685
parent	f318233c0429a8539eaf0ff207ffc02720cd9e4b (diff)