summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-05-05 04:57:25 +0000
committerCoprDistGit <infra@openeuler.org>2023-05-05 04:57:25 +0000
commitb55bc4e1af65030532a8eb4b7adcaa4d65fb1370 (patch)
tree8cee5d78f3a5bff485387a11a4cae586d0c7f953
parent512e25a71abb78080d1741d74ae06de037ae6de1 (diff)
automatic import of python-textract-trpopeneuler20.03
-rw-r--r--.gitignore1
-rw-r--r--python-textract-trp.spec288
-rw-r--r--sources1
3 files changed, 290 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..e8bb5e3 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/textract-trp-0.1.3.tar.gz
diff --git a/python-textract-trp.spec b/python-textract-trp.spec
new file mode 100644
index 0000000..4179cfd
--- /dev/null
+++ b/python-textract-trp.spec
@@ -0,0 +1,288 @@
+%global _empty_manifest_terminate_build 0
+Name: python-textract-trp
+Version: 0.1.3
+Release: 1
+Summary: Parser for Amazon Textract results.
+License: MIT
+URL: https://github.com/mludvig/amazon-textract-parser
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/13/61/d4dbf2ff0875a6bff33d99b7162f3d3843072af76c09cffe466171ade6b8/textract-trp-0.1.3.tar.gz
+BuildArch: noarch
+
+
+%description
+# Amazon Textract Results Parser - `textract-trp`
+
+Amazon *Textract Results Parser* or `trp` module packaged and improved for ease of use.
+
+## TL;DR
+
+```
+pip install textract-trp
+```
+
+Requires Python 3.6 or newer.
+
+## Usage
+
+```python
+import boto3
+import trp
+
+textract_client = boto3.client('textract')
+results = textract_client.analyze_document(... your file and other params ...)
+doc = trp.Document(results)
+```
+
+Now you can examine `doc.pages`. For example print all the detected on the page:
+
+```python
+print(doc.pages[0].text)
+```
+
+Or print out the detected tables in CSV format:
+
+```python
+for row in doc.pages[0].tables[0].rows:
+ for cell in row.cells:
+ print(cell.text.strip(), end=",")
+ print()
+```
+
+Or retrieve text from a given position on the page. For that we have to create
+*Bounding Box* with the required coordinates relative to the page.
+
+```python
+# Coordinates are from top-left corner [0,0] to bottom-right [1,1]
+bbox = trp.BoundingBox(width=0.220, height=0.085, left=0.734, top=0.140)
+lines = doc.pages[0].getLinesInBoundingBox(bbox)
+
+# Print only the lines contained in the Bounding Box
+for line in lines:
+ print(line.text)
+```
+
+Refer to the [Textract blog post](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/)
+and to [amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples) GitHub repository for more details.
+
+## Background
+
+The [Amazon blog post about Textract](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/)
+refers to a python module `trp.py` which used to be quite hard to find. There
+are many posts on the internet from people looking for the module, often confused by
+the *"other trp module"* that's got nothing to do with Textract.
+
+Hence I decided to package and publish the `trp.py` module from the
+[aws-samples/amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples)
+repository. Fortunately its [MIT
+license](https://github.com/aws-samples/amazon-textract-code-samples/blob/master/LICENSE)
+permits that.
+
+Over time I have made some improvements to the module for ease of use.
+
+### Maintainer
+
+[Michael Ludvig](https://aws.nz)
+
+
+%package -n python3-textract-trp
+Summary: Parser for Amazon Textract results.
+Provides: python-textract-trp
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-textract-trp
+# Amazon Textract Results Parser - `textract-trp`
+
+Amazon *Textract Results Parser* or `trp` module packaged and improved for ease of use.
+
+## TL;DR
+
+```
+pip install textract-trp
+```
+
+Requires Python 3.6 or newer.
+
+## Usage
+
+```python
+import boto3
+import trp
+
+textract_client = boto3.client('textract')
+results = textract_client.analyze_document(... your file and other params ...)
+doc = trp.Document(results)
+```
+
+Now you can examine `doc.pages`. For example print all the detected on the page:
+
+```python
+print(doc.pages[0].text)
+```
+
+Or print out the detected tables in CSV format:
+
+```python
+for row in doc.pages[0].tables[0].rows:
+ for cell in row.cells:
+ print(cell.text.strip(), end=",")
+ print()
+```
+
+Or retrieve text from a given position on the page. For that we have to create
+*Bounding Box* with the required coordinates relative to the page.
+
+```python
+# Coordinates are from top-left corner [0,0] to bottom-right [1,1]
+bbox = trp.BoundingBox(width=0.220, height=0.085, left=0.734, top=0.140)
+lines = doc.pages[0].getLinesInBoundingBox(bbox)
+
+# Print only the lines contained in the Bounding Box
+for line in lines:
+ print(line.text)
+```
+
+Refer to the [Textract blog post](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/)
+and to [amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples) GitHub repository for more details.
+
+## Background
+
+The [Amazon blog post about Textract](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/)
+refers to a python module `trp.py` which used to be quite hard to find. There
+are many posts on the internet from people looking for the module, often confused by
+the *"other trp module"* that's got nothing to do with Textract.
+
+Hence I decided to package and publish the `trp.py` module from the
+[aws-samples/amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples)
+repository. Fortunately its [MIT
+license](https://github.com/aws-samples/amazon-textract-code-samples/blob/master/LICENSE)
+permits that.
+
+Over time I have made some improvements to the module for ease of use.
+
+### Maintainer
+
+[Michael Ludvig](https://aws.nz)
+
+
+%package help
+Summary: Development documents and examples for textract-trp
+Provides: python3-textract-trp-doc
+%description help
+# Amazon Textract Results Parser - `textract-trp`
+
+Amazon *Textract Results Parser* or `trp` module packaged and improved for ease of use.
+
+## TL;DR
+
+```
+pip install textract-trp
+```
+
+Requires Python 3.6 or newer.
+
+## Usage
+
+```python
+import boto3
+import trp
+
+textract_client = boto3.client('textract')
+results = textract_client.analyze_document(... your file and other params ...)
+doc = trp.Document(results)
+```
+
+Now you can examine `doc.pages`. For example print all the detected on the page:
+
+```python
+print(doc.pages[0].text)
+```
+
+Or print out the detected tables in CSV format:
+
+```python
+for row in doc.pages[0].tables[0].rows:
+ for cell in row.cells:
+ print(cell.text.strip(), end=",")
+ print()
+```
+
+Or retrieve text from a given position on the page. For that we have to create
+*Bounding Box* with the required coordinates relative to the page.
+
+```python
+# Coordinates are from top-left corner [0,0] to bottom-right [1,1]
+bbox = trp.BoundingBox(width=0.220, height=0.085, left=0.734, top=0.140)
+lines = doc.pages[0].getLinesInBoundingBox(bbox)
+
+# Print only the lines contained in the Bounding Box
+for line in lines:
+ print(line.text)
+```
+
+Refer to the [Textract blog post](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/)
+and to [amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples) GitHub repository for more details.
+
+## Background
+
+The [Amazon blog post about Textract](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/)
+refers to a python module `trp.py` which used to be quite hard to find. There
+are many posts on the internet from people looking for the module, often confused by
+the *"other trp module"* that's got nothing to do with Textract.
+
+Hence I decided to package and publish the `trp.py` module from the
+[aws-samples/amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples)
+repository. Fortunately its [MIT
+license](https://github.com/aws-samples/amazon-textract-code-samples/blob/master/LICENSE)
+permits that.
+
+Over time I have made some improvements to the module for ease of use.
+
+### Maintainer
+
+[Michael Ludvig](https://aws.nz)
+
+
+%prep
+%autosetup -n textract-trp-0.1.3
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-textract-trp -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 0.1.3-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..554e8f1
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+90e4e2f9069c0f67cd89e3979b0edc1a textract-trp-0.1.3.tar.gz