%global _empty_manifest_terminate_build 0
Name:		python-textract-trp
Version:	0.1.3
Release:	1
Summary:	Parser for Amazon Textract results.
License:	MIT
URL:		https://github.com/mludvig/amazon-textract-parser
Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/13/61/d4dbf2ff0875a6bff33d99b7162f3d3843072af76c09cffe466171ade6b8/textract-trp-0.1.3.tar.gz
BuildArch:	noarch


%description
# Amazon Textract Results Parser - `textract-trp`

Amazon *Textract Results Parser* or `trp` module packaged and improved for ease of use.

## TL;DR

```
pip install textract-trp
```

Requires Python 3.6 or newer.

## Usage

```python
import boto3
import trp

textract_client = boto3.client('textract')
results = textract_client.analyze_document(... your file and other params ...)
doc = trp.Document(results)
```

Now you can examine `doc.pages`. For example print all the detected on the page:

```python
print(doc.pages[0].text)
```

Or print out the detected tables in CSV format:

```python
for row in doc.pages[0].tables[0].rows:
    for cell in row.cells:
        print(cell.text.strip(), end=",")
    print()
```

Or retrieve text from a given position on the page. For that we have to create
*Bounding Box* with the required coordinates relative to the page.

```python
# Coordinates are from top-left corner [0,0] to bottom-right [1,1]
bbox = trp.BoundingBox(width=0.220, height=0.085, left=0.734, top=0.140)
lines = doc.pages[0].getLinesInBoundingBox(bbox)

# Print only the lines contained in the Bounding Box
for line in lines:
    print(line.text)
```

Refer to the [Textract blog post](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/)
and to [amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples) GitHub repository for more details.

## Background

The [Amazon blog post about Textract](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/)
refers to a python module `trp.py` which used to be quite hard to find. There
are many posts on the internet from people looking for the module, often confused by
the *"other trp module"* that's got nothing to do with Textract.

Hence I decided to package and publish the `trp.py` module from the
[aws-samples/amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples)
repository. Fortunately its [MIT
license](https://github.com/aws-samples/amazon-textract-code-samples/blob/master/LICENSE)
permits that.

Over time I have made some improvements to the module for ease of use.

### Maintainer

[Michael Ludvig](https://aws.nz)


%package -n python3-textract-trp
Summary:	Parser for Amazon Textract results.
Provides:	python-textract-trp
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-textract-trp
# Amazon Textract Results Parser - `textract-trp`

Amazon *Textract Results Parser* or `trp` module packaged and improved for ease of use.

## TL;DR

```
pip install textract-trp
```

Requires Python 3.6 or newer.

## Usage

```python
import boto3
import trp

textract_client = boto3.client('textract')
results = textract_client.analyze_document(... your file and other params ...)
doc = trp.Document(results)
```

Now you can examine `doc.pages`. For example print all the detected on the page:

```python
print(doc.pages[0].text)
```

Or print out the detected tables in CSV format:

```python
for row in doc.pages[0].tables[0].rows:
    for cell in row.cells:
        print(cell.text.strip(), end=",")
    print()
```

Or retrieve text from a given position on the page. For that we have to create
*Bounding Box* with the required coordinates relative to the page.

```python
# Coordinates are from top-left corner [0,0] to bottom-right [1,1]
bbox = trp.BoundingBox(width=0.220, height=0.085, left=0.734, top=0.140)
lines = doc.pages[0].getLinesInBoundingBox(bbox)

# Print only the lines contained in the Bounding Box
for line in lines:
    print(line.text)
```

Refer to the [Textract blog post](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/)
and to [amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples) GitHub repository for more details.

## Background

The [Amazon blog post about Textract](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/)
refers to a python module `trp.py` which used to be quite hard to find. There
are many posts on the internet from people looking for the module, often confused by
the *"other trp module"* that's got nothing to do with Textract.

Hence I decided to package and publish the `trp.py` module from the
[aws-samples/amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples)
repository. Fortunately its [MIT
license](https://github.com/aws-samples/amazon-textract-code-samples/blob/master/LICENSE)
permits that.

Over time I have made some improvements to the module for ease of use.

### Maintainer

[Michael Ludvig](https://aws.nz)


%package help
Summary:	Development documents and examples for textract-trp
Provides:	python3-textract-trp-doc
%description help
# Amazon Textract Results Parser - `textract-trp`

Amazon *Textract Results Parser* or `trp` module packaged and improved for ease of use.

## TL;DR

```
pip install textract-trp
```

Requires Python 3.6 or newer.

## Usage

```python
import boto3
import trp

textract_client = boto3.client('textract')
results = textract_client.analyze_document(... your file and other params ...)
doc = trp.Document(results)
```

Now you can examine `doc.pages`. For example print all the detected on the page:

```python
print(doc.pages[0].text)
```

Or print out the detected tables in CSV format:

```python
for row in doc.pages[0].tables[0].rows:
    for cell in row.cells:
        print(cell.text.strip(), end=",")
    print()
```

Or retrieve text from a given position on the page. For that we have to create
*Bounding Box* with the required coordinates relative to the page.

```python
# Coordinates are from top-left corner [0,0] to bottom-right [1,1]
bbox = trp.BoundingBox(width=0.220, height=0.085, left=0.734, top=0.140)
lines = doc.pages[0].getLinesInBoundingBox(bbox)

# Print only the lines contained in the Bounding Box
for line in lines:
    print(line.text)
```

Refer to the [Textract blog post](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/)
and to [amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples) GitHub repository for more details.

## Background

The [Amazon blog post about Textract](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/)
refers to a python module `trp.py` which used to be quite hard to find. There
are many posts on the internet from people looking for the module, often confused by
the *"other trp module"* that's got nothing to do with Textract.

Hence I decided to package and publish the `trp.py` module from the
[aws-samples/amazon-textract-code-samples](https://github.com/aws-samples/amazon-textract-code-samples)
repository. Fortunately its [MIT
license](https://github.com/aws-samples/amazon-textract-code-samples/blob/master/LICENSE)
permits that.

Over time I have made some improvements to the module for ease of use.

### Maintainer

[Michael Ludvig](https://aws.nz)


%prep
%autosetup -n textract-trp-0.1.3

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-textract-trp -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 0.1.3-1
- Package Spec generated