%global _empty_manifest_terminate_build 0 Name: python-textract-trp Version: 0.1.3 Release: 1 Summary: Parser for Amazon Textract results. License: MIT URL: https://github.com/mludvig/amazon-textract-parser Source0: https://mirrors.nju.edu.cn/pypi/web/packages/13/61/d4dbf2ff0875a6bff33d99b7162f3d3843072af76c09cffe466171ade6b8/textract-trp-0.1.3.tar.gz BuildArch: noarch %description # Amazon Textract Results Parser - `textract-trp` Amazon *Textract Results Parser* or `trp` module packaged and improved for ease of use. ## TL;DR ``` pip install textract-trp ``` Requires Python 3.6 or newer. ## Usage ```python import boto3 import trp textract_client = boto3.client('textract') results = textract_client.analyze_document(... your file and other params ...) doc = trp.Document(results) ``` Now you can examine `doc.pages`. For example print all the detected on the page: ```python print(doc.pages[0].text) ``` Or print out the detected tables in CSV format: ```python for row in doc.pages[0].tables[0].rows: for cell in row.cells: print(cell.text.strip(), end=",") print() ``` Or retrieve text from a given position on the page. For that we have to create *Bounding Box* with the required coordinates relative to the page. ```python # Coordinates are from top-left corner [0,0] to bottom-right [1,1] bbox = trp.BoundingBox(width=0.220, height=0.085, left=0.734, top=0.140) lines = doc.pages[0].getLinesInBoundingBox(bbox) # Print only the lines contained in the Bounding Box for line in lines: print(line.text) ``` Refer to the [Textract blog post](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/) and to [amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples) GitHub repository for more details. ## Background The [Amazon blog post about Textract](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/) refers to a python module `trp.py` which used to be quite hard to find. There are many posts on the internet from people looking for the module, often confused by the *"other trp module"* that's got nothing to do with Textract. Hence I decided to package and publish the `trp.py` module from the [aws-samples/amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples) repository. Fortunately its [MIT license](https://github.com/aws-samples/amazon-textract-code-samples/blob/master/LICENSE) permits that. Over time I have made some improvements to the module for ease of use. ### Maintainer [Michael Ludvig](https://aws.nz) %package -n python3-textract-trp Summary: Parser for Amazon Textract results. Provides: python-textract-trp BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-textract-trp # Amazon Textract Results Parser - `textract-trp` Amazon *Textract Results Parser* or `trp` module packaged and improved for ease of use. ## TL;DR ``` pip install textract-trp ``` Requires Python 3.6 or newer. ## Usage ```python import boto3 import trp textract_client = boto3.client('textract') results = textract_client.analyze_document(... your file and other params ...) doc = trp.Document(results) ``` Now you can examine `doc.pages`. For example print all the detected on the page: ```python print(doc.pages[0].text) ``` Or print out the detected tables in CSV format: ```python for row in doc.pages[0].tables[0].rows: for cell in row.cells: print(cell.text.strip(), end=",") print() ``` Or retrieve text from a given position on the page. For that we have to create *Bounding Box* with the required coordinates relative to the page. ```python # Coordinates are from top-left corner [0,0] to bottom-right [1,1] bbox = trp.BoundingBox(width=0.220, height=0.085, left=0.734, top=0.140) lines = doc.pages[0].getLinesInBoundingBox(bbox) # Print only the lines contained in the Bounding Box for line in lines: print(line.text) ``` Refer to the [Textract blog post](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/) and to [amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples) GitHub repository for more details. ## Background The [Amazon blog post about Textract](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/) refers to a python module `trp.py` which used to be quite hard to find. There are many posts on the internet from people looking for the module, often confused by the *"other trp module"* that's got nothing to do with Textract. Hence I decided to package and publish the `trp.py` module from the [aws-samples/amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples) repository. Fortunately its [MIT license](https://github.com/aws-samples/amazon-textract-code-samples/blob/master/LICENSE) permits that. Over time I have made some improvements to the module for ease of use. ### Maintainer [Michael Ludvig](https://aws.nz) %package help Summary: Development documents and examples for textract-trp Provides: python3-textract-trp-doc %description help # Amazon Textract Results Parser - `textract-trp` Amazon *Textract Results Parser* or `trp` module packaged and improved for ease of use. ## TL;DR ``` pip install textract-trp ``` Requires Python 3.6 or newer. ## Usage ```python import boto3 import trp textract_client = boto3.client('textract') results = textract_client.analyze_document(... your file and other params ...) doc = trp.Document(results) ``` Now you can examine `doc.pages`. For example print all the detected on the page: ```python print(doc.pages[0].text) ``` Or print out the detected tables in CSV format: ```python for row in doc.pages[0].tables[0].rows: for cell in row.cells: print(cell.text.strip(), end=",") print() ``` Or retrieve text from a given position on the page. For that we have to create *Bounding Box* with the required coordinates relative to the page. ```python # Coordinates are from top-left corner [0,0] to bottom-right [1,1] bbox = trp.BoundingBox(width=0.220, height=0.085, left=0.734, top=0.140) lines = doc.pages[0].getLinesInBoundingBox(bbox) # Print only the lines contained in the Bounding Box for line in lines: print(line.text) ``` Refer to the [Textract blog post](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/) and to [amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples) GitHub repository for more details. ## Background The [Amazon blog post about Textract](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/) refers to a python module `trp.py` which used to be quite hard to find. There are many posts on the internet from people looking for the module, often confused by the *"other trp module"* that's got nothing to do with Textract. Hence I decided to package and publish the `trp.py` module from the [aws-samples/amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples) repository. Fortunately its [MIT license](https://github.com/aws-samples/amazon-textract-code-samples/blob/master/LICENSE) permits that. Over time I have made some improvements to the module for ease of use. ### Maintainer [Michael Ludvig](https://aws.nz) %prep %autosetup -n textract-trp-0.1.3 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-textract-trp -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Fri May 05 2023 Python_Bot - 0.1.3-1 - Package Spec generated