diff options
| author | CoprDistGit <infra@openeuler.org> | 2023-05-05 04:57:25 +0000 |
|---|---|---|
| committer | CoprDistGit <infra@openeuler.org> | 2023-05-05 04:57:25 +0000 |
| commit | b55bc4e1af65030532a8eb4b7adcaa4d65fb1370 (patch) | |
| tree | 8cee5d78f3a5bff485387a11a4cae586d0c7f953 | |
| parent | 512e25a71abb78080d1741d74ae06de037ae6de1 (diff) | |
automatic import of python-textract-trpopeneuler20.03
| -rw-r--r-- | .gitignore | 1 | ||||
| -rw-r--r-- | python-textract-trp.spec | 288 | ||||
| -rw-r--r-- | sources | 1 |
3 files changed, 290 insertions, 0 deletions
@@ -0,0 +1 @@ +/textract-trp-0.1.3.tar.gz diff --git a/python-textract-trp.spec b/python-textract-trp.spec new file mode 100644 index 0000000..4179cfd --- /dev/null +++ b/python-textract-trp.spec @@ -0,0 +1,288 @@ +%global _empty_manifest_terminate_build 0 +Name: python-textract-trp +Version: 0.1.3 +Release: 1 +Summary: Parser for Amazon Textract results. +License: MIT +URL: https://github.com/mludvig/amazon-textract-parser +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/13/61/d4dbf2ff0875a6bff33d99b7162f3d3843072af76c09cffe466171ade6b8/textract-trp-0.1.3.tar.gz +BuildArch: noarch + + +%description +# Amazon Textract Results Parser - `textract-trp` + +Amazon *Textract Results Parser* or `trp` module packaged and improved for ease of use. + +## TL;DR + +``` +pip install textract-trp +``` + +Requires Python 3.6 or newer. + +## Usage + +```python +import boto3 +import trp + +textract_client = boto3.client('textract') +results = textract_client.analyze_document(... your file and other params ...) +doc = trp.Document(results) +``` + +Now you can examine `doc.pages`. For example print all the detected on the page: + +```python +print(doc.pages[0].text) +``` + +Or print out the detected tables in CSV format: + +```python +for row in doc.pages[0].tables[0].rows: + for cell in row.cells: + print(cell.text.strip(), end=",") + print() +``` + +Or retrieve text from a given position on the page. For that we have to create +*Bounding Box* with the required coordinates relative to the page. + +```python +# Coordinates are from top-left corner [0,0] to bottom-right [1,1] +bbox = trp.BoundingBox(width=0.220, height=0.085, left=0.734, top=0.140) +lines = doc.pages[0].getLinesInBoundingBox(bbox) + +# Print only the lines contained in the Bounding Box +for line in lines: + print(line.text) +``` + +Refer to the [Textract blog post](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/) +and to [amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples) GitHub repository for more details. + +## Background + +The [Amazon blog post about Textract](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/) +refers to a python module `trp.py` which used to be quite hard to find. There +are many posts on the internet from people looking for the module, often confused by +the *"other trp module"* that's got nothing to do with Textract. + +Hence I decided to package and publish the `trp.py` module from the +[aws-samples/amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples) +repository. Fortunately its [MIT +license](https://github.com/aws-samples/amazon-textract-code-samples/blob/master/LICENSE) +permits that. + +Over time I have made some improvements to the module for ease of use. + +### Maintainer + +[Michael Ludvig](https://aws.nz) + + +%package -n python3-textract-trp +Summary: Parser for Amazon Textract results. +Provides: python-textract-trp +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-textract-trp +# Amazon Textract Results Parser - `textract-trp` + +Amazon *Textract Results Parser* or `trp` module packaged and improved for ease of use. + +## TL;DR + +``` +pip install textract-trp +``` + +Requires Python 3.6 or newer. + +## Usage + +```python +import boto3 +import trp + +textract_client = boto3.client('textract') +results = textract_client.analyze_document(... your file and other params ...) +doc = trp.Document(results) +``` + +Now you can examine `doc.pages`. For example print all the detected on the page: + +```python +print(doc.pages[0].text) +``` + +Or print out the detected tables in CSV format: + +```python +for row in doc.pages[0].tables[0].rows: + for cell in row.cells: + print(cell.text.strip(), end=",") + print() +``` + +Or retrieve text from a given position on the page. For that we have to create +*Bounding Box* with the required coordinates relative to the page. + +```python +# Coordinates are from top-left corner [0,0] to bottom-right [1,1] +bbox = trp.BoundingBox(width=0.220, height=0.085, left=0.734, top=0.140) +lines = doc.pages[0].getLinesInBoundingBox(bbox) + +# Print only the lines contained in the Bounding Box +for line in lines: + print(line.text) +``` + +Refer to the [Textract blog post](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/) +and to [amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples) GitHub repository for more details. + +## Background + +The [Amazon blog post about Textract](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/) +refers to a python module `trp.py` which used to be quite hard to find. There +are many posts on the internet from people looking for the module, often confused by +the *"other trp module"* that's got nothing to do with Textract. + +Hence I decided to package and publish the `trp.py` module from the +[aws-samples/amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples) +repository. Fortunately its [MIT +license](https://github.com/aws-samples/amazon-textract-code-samples/blob/master/LICENSE) +permits that. + +Over time I have made some improvements to the module for ease of use. + +### Maintainer + +[Michael Ludvig](https://aws.nz) + + +%package help +Summary: Development documents and examples for textract-trp +Provides: python3-textract-trp-doc +%description help +# Amazon Textract Results Parser - `textract-trp` + +Amazon *Textract Results Parser* or `trp` module packaged and improved for ease of use. + +## TL;DR + +``` +pip install textract-trp +``` + +Requires Python 3.6 or newer. + +## Usage + +```python +import boto3 +import trp + +textract_client = boto3.client('textract') +results = textract_client.analyze_document(... your file and other params ...) +doc = trp.Document(results) +``` + +Now you can examine `doc.pages`. For example print all the detected on the page: + +```python +print(doc.pages[0].text) +``` + +Or print out the detected tables in CSV format: + +```python +for row in doc.pages[0].tables[0].rows: + for cell in row.cells: + print(cell.text.strip(), end=",") + print() +``` + +Or retrieve text from a given position on the page. For that we have to create +*Bounding Box* with the required coordinates relative to the page. + +```python +# Coordinates are from top-left corner [0,0] to bottom-right [1,1] +bbox = trp.BoundingBox(width=0.220, height=0.085, left=0.734, top=0.140) +lines = doc.pages[0].getLinesInBoundingBox(bbox) + +# Print only the lines contained in the Bounding Box +for line in lines: + print(line.text) +``` + +Refer to the [Textract blog post](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/) +and to [amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples) GitHub repository for more details. + +## Background + +The [Amazon blog post about Textract](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/) +refers to a python module `trp.py` which used to be quite hard to find. There +are many posts on the internet from people looking for the module, often confused by +the *"other trp module"* that's got nothing to do with Textract. + +Hence I decided to package and publish the `trp.py` module from the +[aws-samples/amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples) +repository. Fortunately its [MIT +license](https://github.com/aws-samples/amazon-textract-code-samples/blob/master/LICENSE) +permits that. + +Over time I have made some improvements to the module for ease of use. + +### Maintainer + +[Michael Ludvig](https://aws.nz) + + +%prep +%autosetup -n textract-trp-0.1.3 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-textract-trp -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 0.1.3-1 +- Package Spec generated @@ -0,0 +1 @@ +90e4e2f9069c0f67cd89e3979b0edc1a textract-trp-0.1.3.tar.gz |
