%global _empty_manifest_terminate_build 0 Name: python-pdf2docx Version: 0.5.6 Release: 1 Summary: Open source Python library converting pdf to docx. License: GPL v3 URL: https://github.com/dothinking/pdf2docx Source0: https://mirrors.nju.edu.cn/pypi/web/packages/05/8f/e50f748e72b812b6ff1af534f2c618c79ad58f0e50d09369665be36086a7/pdf2docx-0.5.6.tar.gz BuildArch: noarch Requires: python3-PyMuPDF Requires: python3-docx Requires: python3-fonttools Requires: python3-numpy Requires: python3-opencv-python Requires: python3-fire %description English | [中文](README_CN.md) # pdf2docx ![python-version](https://img.shields.io/badge/python->=3.6-green.svg) [![codecov](https://codecov.io/gh/dothinking/pdf2docx/branch/master/graph/badge.svg)](https://codecov.io/gh/dothinking/pdf2docx) [![pypi-version](https://img.shields.io/pypi/v/pdf2docx.svg)](https://pypi.python.org/pypi/pdf2docx/) ![license](https://img.shields.io/pypi/l/pdf2docx.svg) ![pypi-downloads](https://img.shields.io/pypi/dm/pdf2docx) - Extract data from PDF with `PyMuPDF`, e.g. text, images and drawings - Parse layout with rule, e.g. sections, paragraphs, images and tables - Generate docx with `python-docx` ## Features - Parse and re-create page layout - page margin - section and column (1 or 2 columns only) - page header and footer [TODO] - Parse and re-create paragraph - OCR text [TODO] - text in horizontal/vertical direction: from left to right, from bottom to top - font style, e.g. font name, size, weight, italic and color - text format, e.g. highlight, underline, strike-through - list style [TODO] - external hyper link - paragraph horizontal alignment (left/right/center/justify) and vertical spacing - Parse and re-create image - in-line image - image in Gray/RGB/CMYK mode - transparent image - floating image, i.e. picture behind text - Parse and re-create table - border style, e.g. width, color - shading style, i.e. background color - merged cells - vertical direction cell - table with partly hidden borders - nested tables - Parsing pages with multi-processing *It can also be used as a tool to extract table contents since both table content and format/style is parsed.* ## Limitations - Text-based PDF file - Left to right language - Normal reading direction, no word transformation / rotation - Rule-based method can't 100% convert the PDF layout ## Documentation - [Installation](https://dothinking.github.io/pdf2docx/installation.html) - [Quickstart](https://dothinking.github.io/pdf2docx/quickstart.html) - [Convert PDF](https://dothinking.github.io/pdf2docx/quickstart.convert.html) - [Extract table](https://dothinking.github.io/pdf2docx/quickstart.table.html) - [Command Line Interface](https://dothinking.github.io/pdf2docx/quickstart.cli.html) - [Graphic User Interface](https://dothinking.github.io/pdf2docx/quickstart.gui.html) - [Technical Documentation (In Chinese)](https://dothinking.github.io/pdf2docx/techdoc.html) - [API Documentation](https://dothinking.github.io/pdf2docx/modules.html) ## Sample ![sample_compare.png](https://s1.ax1x.com/2020/08/04/aDryx1.png) %package -n python3-pdf2docx Summary: Open source Python library converting pdf to docx. Provides: python-pdf2docx BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-pdf2docx English | [中文](README_CN.md) # pdf2docx ![python-version](https://img.shields.io/badge/python->=3.6-green.svg) [![codecov](https://codecov.io/gh/dothinking/pdf2docx/branch/master/graph/badge.svg)](https://codecov.io/gh/dothinking/pdf2docx) [![pypi-version](https://img.shields.io/pypi/v/pdf2docx.svg)](https://pypi.python.org/pypi/pdf2docx/) ![license](https://img.shields.io/pypi/l/pdf2docx.svg) ![pypi-downloads](https://img.shields.io/pypi/dm/pdf2docx) - Extract data from PDF with `PyMuPDF`, e.g. text, images and drawings - Parse layout with rule, e.g. sections, paragraphs, images and tables - Generate docx with `python-docx` ## Features - Parse and re-create page layout - page margin - section and column (1 or 2 columns only) - page header and footer [TODO] - Parse and re-create paragraph - OCR text [TODO] - text in horizontal/vertical direction: from left to right, from bottom to top - font style, e.g. font name, size, weight, italic and color - text format, e.g. highlight, underline, strike-through - list style [TODO] - external hyper link - paragraph horizontal alignment (left/right/center/justify) and vertical spacing - Parse and re-create image - in-line image - image in Gray/RGB/CMYK mode - transparent image - floating image, i.e. picture behind text - Parse and re-create table - border style, e.g. width, color - shading style, i.e. background color - merged cells - vertical direction cell - table with partly hidden borders - nested tables - Parsing pages with multi-processing *It can also be used as a tool to extract table contents since both table content and format/style is parsed.* ## Limitations - Text-based PDF file - Left to right language - Normal reading direction, no word transformation / rotation - Rule-based method can't 100% convert the PDF layout ## Documentation - [Installation](https://dothinking.github.io/pdf2docx/installation.html) - [Quickstart](https://dothinking.github.io/pdf2docx/quickstart.html) - [Convert PDF](https://dothinking.github.io/pdf2docx/quickstart.convert.html) - [Extract table](https://dothinking.github.io/pdf2docx/quickstart.table.html) - [Command Line Interface](https://dothinking.github.io/pdf2docx/quickstart.cli.html) - [Graphic User Interface](https://dothinking.github.io/pdf2docx/quickstart.gui.html) - [Technical Documentation (In Chinese)](https://dothinking.github.io/pdf2docx/techdoc.html) - [API Documentation](https://dothinking.github.io/pdf2docx/modules.html) ## Sample ![sample_compare.png](https://s1.ax1x.com/2020/08/04/aDryx1.png) %package help Summary: Development documents and examples for pdf2docx Provides: python3-pdf2docx-doc %description help English | [中文](README_CN.md) # pdf2docx ![python-version](https://img.shields.io/badge/python->=3.6-green.svg) [![codecov](https://codecov.io/gh/dothinking/pdf2docx/branch/master/graph/badge.svg)](https://codecov.io/gh/dothinking/pdf2docx) [![pypi-version](https://img.shields.io/pypi/v/pdf2docx.svg)](https://pypi.python.org/pypi/pdf2docx/) ![license](https://img.shields.io/pypi/l/pdf2docx.svg) ![pypi-downloads](https://img.shields.io/pypi/dm/pdf2docx) - Extract data from PDF with `PyMuPDF`, e.g. text, images and drawings - Parse layout with rule, e.g. sections, paragraphs, images and tables - Generate docx with `python-docx` ## Features - Parse and re-create page layout - page margin - section and column (1 or 2 columns only) - page header and footer [TODO] - Parse and re-create paragraph - OCR text [TODO] - text in horizontal/vertical direction: from left to right, from bottom to top - font style, e.g. font name, size, weight, italic and color - text format, e.g. highlight, underline, strike-through - list style [TODO] - external hyper link - paragraph horizontal alignment (left/right/center/justify) and vertical spacing - Parse and re-create image - in-line image - image in Gray/RGB/CMYK mode - transparent image - floating image, i.e. picture behind text - Parse and re-create table - border style, e.g. width, color - shading style, i.e. background color - merged cells - vertical direction cell - table with partly hidden borders - nested tables - Parsing pages with multi-processing *It can also be used as a tool to extract table contents since both table content and format/style is parsed.* ## Limitations - Text-based PDF file - Left to right language - Normal reading direction, no word transformation / rotation - Rule-based method can't 100% convert the PDF layout ## Documentation - [Installation](https://dothinking.github.io/pdf2docx/installation.html) - [Quickstart](https://dothinking.github.io/pdf2docx/quickstart.html) - [Convert PDF](https://dothinking.github.io/pdf2docx/quickstart.convert.html) - [Extract table](https://dothinking.github.io/pdf2docx/quickstart.table.html) - [Command Line Interface](https://dothinking.github.io/pdf2docx/quickstart.cli.html) - [Graphic User Interface](https://dothinking.github.io/pdf2docx/quickstart.gui.html) - [Technical Documentation (In Chinese)](https://dothinking.github.io/pdf2docx/techdoc.html) - [API Documentation](https://dothinking.github.io/pdf2docx/modules.html) ## Sample ![sample_compare.png](https://s1.ax1x.com/2020/08/04/aDryx1.png) %prep %autosetup -n pdf2docx-0.5.6 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-pdf2docx -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Fri May 05 2023 Python_Bot - 0.5.6-1 - Package Spec generated