diff options
| -rw-r--r-- | .gitignore | 1 | ||||
| -rw-r--r-- | python-amazon-textract-helper.spec | 656 | ||||
| -rw-r--r-- | sources | 1 |
3 files changed, 658 insertions, 0 deletions
@@ -0,0 +1 @@ +/amazon-textract-helper-0.0.34.tar.gz diff --git a/python-amazon-textract-helper.spec b/python-amazon-textract-helper.spec new file mode 100644 index 0000000..159b4bf --- /dev/null +++ b/python-amazon-textract-helper.spec @@ -0,0 +1,656 @@ +%global _empty_manifest_terminate_build 0 +Name: python-amazon-textract-helper +Version: 0.0.34 +Release: 1 +Summary: Amazon Textract Helper tools +License: Apache License Version 2.0 +URL: https://github.com/aws-samples/amazon-textract-textractor/tree/master/helper +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/55/d3/c219329b180317f1e119f8da28cfba26e71ef0f4278b5e89f3700247df21/amazon-textract-helper-0.0.34.tar.gz +BuildArch: noarch + +Requires: python3-boto3 +Requires: python3-botocore +Requires: python3-amazon-textract-response-parser +Requires: python3-amazon-textract-caller +Requires: python3-amazon-textract-overlayer +Requires: python3-amazon-textract-prettyprinter +Requires: python3-Pillow +Requires: python3-PyPDF2 + +%description +# Textractor-Textract-Helper + +amazon-textract-helper provides a collection of ready to use functions and sample implementations to speed up the evaluation and development for any project using Amazon Textract. +It installs a command line tool called ```amazon-textract``` + + +# Install + +```bash +> python -m pip install amazon-textract-helper +``` + +Make sure your environment is setup with AWS credentials through configuration files or environment variables or an attached role. (https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) + +# Test + +```bash +> amazon-textract --help +usage: amazon-textract [-h] (--input-document INPUT_DOCUMENT | --example | --stdin) [--features {FORMS,TABLES} [{FORMS,TABLES} ...]] + [--pretty-print {WORDS,LINES,FORMS,TABLES} [{WORDS,LINES,FORMS,TABLES} ...]] + [--pretty-print-table-format {csv,plain,simple,github,grid,fancy_grid,pipe,orgtbl,jira,presto,pretty,psql,rst,medi +awiki,moinmoin,youtrack,html,unsafehtml,latex,latex_raw,latex_booktabs,latex_longtable,textile,tsv}] + [--overlay {WORD,LINE,FORM,KEY,VALUE,TABLE,CELL} [{WORD,LINE,FORM,KEY,VALUE,TABLE,CELL} ...]] + [--pop-up-overlay-output] [--overlay-output-folder OVERLAY_OUTPUT_FOLDER] [--version] [--no-stdout] [-v | -vv] + +optional arguments: + -h, --help show this help message and exit + --input-document INPUT_DOCUMENT + s3 object (s3://) or file from local filesystem + --example using the example document to call Textract + --stdin receive JSON from stdin + --features {FORMS,TABLES} [{FORMS,TABLES} ...] + features to call Textract with. Will trigger call to AnalyzeDocument instead of DetectDocumentText + --pretty-print {WORDS,LINES,FORMS,TABLES} [{WORDS,LINES,FORMS,TABLES} ...] + --pretty-print-table-format {csv,plain,simple,github,grid,fancy_grid,pipe,orgtbl,jira,presto,pretty,psql,rst,mediawiki,moinmoin,youtrac +k,html,unsafehtml,latex,latex_raw,latex_booktabs,latex_longtable,textile,tsv} + which format to output the pretty print information to. Only effects FORMS and TABLES + --overlay {WORD,LINE,FORM,KEY,VALUE,TABLE,CELL} [{WORD,LINE,FORM,KEY,VALUE,TABLE,CELL} ...] + defines what bounding boxes to draw on the output + --pop-up-overlay-output + shows image with overlay + --overlay-text shows image with WORD or LINE text overlay. When both WORD and LINE overlay are specified, WORD text will be overlayed + --overlay-confidence shows image with confidence overlay + --overlay-output-folder OVERLAY_OUTPUT_FOLDER + output with bounding boxes to folder + --version print version information + --no-stdout no output to stdout + -v >=INFO level logging output to stderr + -vv >=DEBUG level logging output to stderr +``` + +# Sample Commands + +## Easy Start + +```bash +> amazon-textract --example +``` + +this will run the examples document using the DetectDocumentText API. +Output will be printed to stdout and look similar to this: + +```json +{"DocumentMetadata": {"Pages": 1}, "Blocks": [{"BlockType": "PAGE", "Geometry": {"BoundingBox": {"Width": 1.0, "Height": 1.0, "Left": 0.0 +, "Top": 0.0}, "Polygon": [{"X": 9.33321120033382e-17, "Y": 0.0}, {"X": 1.0, "Y": 1.6069064689339292e-16}, {"X": 1.0, "Y": 1.0}], +"HTTPHeaders": {"x-amzn-requestid": "12345678-1234-1234-1234-123456789012", "content-type": "a +pplication/x-amz-json-1.1", "content-length": "48177", "date": "Thu, 01 Apr 2021 21:50:29 GMT"}, "RetryAttempts": 0}} +``` + +It is working. + +## Call with document on S3 + +```bash +> amazon-textract --input-document "s3://somebucket/someprefix/someobjectname.png" +``` + +Output similar to Easy Start + +## Call with document on local file system + +```bash +> amazon-textract --input-document "./somepath/somefilename.png" +``` + +Output similar to Easy Start + +We will continue to use the ```--example``` parameter to keep it simple and easy to reproduce. S3 and local files work the same way, just instead of --example use --input-document <location>. + +## Call with STDIN + +```bash +# first create JSON +amazon-textract --example > example.json +# now use a stored JSON with the ```amazon-textract``` command +cat example.json | amazon-textract --stdin -pretty-print LINES +``` + +## Call with FORMS and TABLES + +```bash +> amazon-textract --example --features FORMS TABLES +``` + +This will call the [AnalyzeDocument API] (https://docs.aws.amazon.com/textract/latest/dg/API_AnalyzeDocument.html) and output will include +Output will look similar to "Easy Start" but include FORMS and TABLES information + +## Pretty print the output + +Pretty print outputs nicely formatted information for words, lines, forms or tables. + +For example to print the tables identified by Amazon Textract to stdout, use + +```bash +> amazon-textract --example --features TABLES --pretty-print TABLES +``` + +Output will look like this: + +```text +|------------|-----------|---------------------|-----------------|-----------------------| +| | | Previous Employment | History | | +| Start Date | End Date | Employer Name | Position Held | Reason for leaving | +| 1/15/2009 | 6/30/2011 | Any Company | Assistant Baker | Family relocated | +| 7/1/2011 | 8/10/2013 | Best Corp. | Baker | Better opportunity | +| 8/15/2013 | present | Example Corp. | Head Baker | N/A, current employer | + +``` + +to pretty print both, FORMS and TABLES: + +```bash +> amazon-textract --example --features FORMS TABLES --pretty-print FORMS TABLES +``` + +will output + +```text +Phone Number:: 555-0100 +Home Address:: 123 Any Street, Any Town, USA +Full Name:: Jane Doe +Mailing Address:: same as home address +|------------|-----------|---------------------|-----------------|-----------------------| +| | | Previous Employment | History | | +| Start Date | End Date | Employer Name | Position Held | Reason for leaving | +| 1/15/2009 | 6/30/2011 | Any Company | Assistant Baker | Family relocated | +| 7/1/2011 | 8/10/2013 | Best Corp. | Baker | Better opportunity | +| 8/15/2013 | present | Example Corp. | Head Baker | N/A, current employer | +``` + +## Overlay + +**At the moment overlay only works with images, we will add support for PDF soon.** + +The following command runs DetectDocumentText, pretty prints the WORDS in the document to stdout and draws bounding boxes around each WORD and displays the result in a popup window and stores it to a folder called 'overlay-output-folder-name'. + +```bash +amazon-textract --example --pretty-print WORDS --overlay WORD --pop-up-overlay-output --overlay-output-folder overlay-output-folder-name +``` + +<img src="https://github.com/aws-samples/amazon-textract-textractor/blob/master/helper/docs/employmentapp_boxed_WORD_.png" alt="Sample overlay WORD" width="50%" height="50%" border="1"> + + +The following command runs AnalyzeDocument for FORMS and TABLES, pretty prints FORMS and TABLES to to stdout and draws bounding boxes around each TABLE-CELL and FORM KEY/VALUE and displays the result in a popup window and stores it to a folder called 'overlay-output-folder-name'. + +```bash +> amazon-textract --example --features TABLES FORMS --pretty-print FORMS TABLES --overlay FORM CELL --pop-up-overlay-output --overlay-output-folder ../mywonderfuloutputfolderfordocs/ +``` + + +<img src="https://github.com/aws-samples/amazon-textract-textractor/blob/master/helper/docs/employmentapp_boxed_FORM_CELL_.png" alt="Sample overlay FORM CELL" width="50%" height="50%" border="1"> + + +The following command draws bounding boxes around each WORD, overlays the detected WORD text, and displays the result in a popup window and stores it to a folder called 'overlay-output-folder-name'. + +```bash +> amazon-textract --example --overlay WORD --overlay-text --pop-up-overlay-output --overlay-output-folder overlay-output-folder-name +``` + + +<img src="https://github.com/aws-samples/amazon-textract-textractor/blob/master/helper/docs/employmentapp_boxed_WORD_TEXT_OVERLAY.png" alt="Sample overlay LINE with overlay text and confidence percentage" width="50%" height="50%" border="1"> + + +The following command draws bounding boxes around each LINE, overlays LINE text along with percentage confidence of the detected LINE text, and displays the result in a popup window and stores it to a folder called 'overlay-output-folder-name'. + +```bash +> amazon-textract --example --overlay LINE --overlay-text --overlay-confidence --pop-up-overlay-output --overlay-output-folder overlay-output-folder-name +``` + + +<img src="https://github.com/aws-samples/amazon-textract-textractor/blob/master/helper/docs/employmentapp_boxed_LINE_TEXT_OVERLAY.png" alt="Sample overlay LINE with overlay text and confidence percentage" width="50%" height="50%" border="1"> + + + +%package -n python3-amazon-textract-helper +Summary: Amazon Textract Helper tools +Provides: python-amazon-textract-helper +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-amazon-textract-helper +# Textractor-Textract-Helper + +amazon-textract-helper provides a collection of ready to use functions and sample implementations to speed up the evaluation and development for any project using Amazon Textract. +It installs a command line tool called ```amazon-textract``` + + +# Install + +```bash +> python -m pip install amazon-textract-helper +``` + +Make sure your environment is setup with AWS credentials through configuration files or environment variables or an attached role. (https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) + +# Test + +```bash +> amazon-textract --help +usage: amazon-textract [-h] (--input-document INPUT_DOCUMENT | --example | --stdin) [--features {FORMS,TABLES} [{FORMS,TABLES} ...]] + [--pretty-print {WORDS,LINES,FORMS,TABLES} [{WORDS,LINES,FORMS,TABLES} ...]] + [--pretty-print-table-format {csv,plain,simple,github,grid,fancy_grid,pipe,orgtbl,jira,presto,pretty,psql,rst,medi +awiki,moinmoin,youtrack,html,unsafehtml,latex,latex_raw,latex_booktabs,latex_longtable,textile,tsv}] + [--overlay {WORD,LINE,FORM,KEY,VALUE,TABLE,CELL} [{WORD,LINE,FORM,KEY,VALUE,TABLE,CELL} ...]] + [--pop-up-overlay-output] [--overlay-output-folder OVERLAY_OUTPUT_FOLDER] [--version] [--no-stdout] [-v | -vv] + +optional arguments: + -h, --help show this help message and exit + --input-document INPUT_DOCUMENT + s3 object (s3://) or file from local filesystem + --example using the example document to call Textract + --stdin receive JSON from stdin + --features {FORMS,TABLES} [{FORMS,TABLES} ...] + features to call Textract with. Will trigger call to AnalyzeDocument instead of DetectDocumentText + --pretty-print {WORDS,LINES,FORMS,TABLES} [{WORDS,LINES,FORMS,TABLES} ...] + --pretty-print-table-format {csv,plain,simple,github,grid,fancy_grid,pipe,orgtbl,jira,presto,pretty,psql,rst,mediawiki,moinmoin,youtrac +k,html,unsafehtml,latex,latex_raw,latex_booktabs,latex_longtable,textile,tsv} + which format to output the pretty print information to. Only effects FORMS and TABLES + --overlay {WORD,LINE,FORM,KEY,VALUE,TABLE,CELL} [{WORD,LINE,FORM,KEY,VALUE,TABLE,CELL} ...] + defines what bounding boxes to draw on the output + --pop-up-overlay-output + shows image with overlay + --overlay-text shows image with WORD or LINE text overlay. When both WORD and LINE overlay are specified, WORD text will be overlayed + --overlay-confidence shows image with confidence overlay + --overlay-output-folder OVERLAY_OUTPUT_FOLDER + output with bounding boxes to folder + --version print version information + --no-stdout no output to stdout + -v >=INFO level logging output to stderr + -vv >=DEBUG level logging output to stderr +``` + +# Sample Commands + +## Easy Start + +```bash +> amazon-textract --example +``` + +this will run the examples document using the DetectDocumentText API. +Output will be printed to stdout and look similar to this: + +```json +{"DocumentMetadata": {"Pages": 1}, "Blocks": [{"BlockType": "PAGE", "Geometry": {"BoundingBox": {"Width": 1.0, "Height": 1.0, "Left": 0.0 +, "Top": 0.0}, "Polygon": [{"X": 9.33321120033382e-17, "Y": 0.0}, {"X": 1.0, "Y": 1.6069064689339292e-16}, {"X": 1.0, "Y": 1.0}], +"HTTPHeaders": {"x-amzn-requestid": "12345678-1234-1234-1234-123456789012", "content-type": "a +pplication/x-amz-json-1.1", "content-length": "48177", "date": "Thu, 01 Apr 2021 21:50:29 GMT"}, "RetryAttempts": 0}} +``` + +It is working. + +## Call with document on S3 + +```bash +> amazon-textract --input-document "s3://somebucket/someprefix/someobjectname.png" +``` + +Output similar to Easy Start + +## Call with document on local file system + +```bash +> amazon-textract --input-document "./somepath/somefilename.png" +``` + +Output similar to Easy Start + +We will continue to use the ```--example``` parameter to keep it simple and easy to reproduce. S3 and local files work the same way, just instead of --example use --input-document <location>. + +## Call with STDIN + +```bash +# first create JSON +amazon-textract --example > example.json +# now use a stored JSON with the ```amazon-textract``` command +cat example.json | amazon-textract --stdin -pretty-print LINES +``` + +## Call with FORMS and TABLES + +```bash +> amazon-textract --example --features FORMS TABLES +``` + +This will call the [AnalyzeDocument API] (https://docs.aws.amazon.com/textract/latest/dg/API_AnalyzeDocument.html) and output will include +Output will look similar to "Easy Start" but include FORMS and TABLES information + +## Pretty print the output + +Pretty print outputs nicely formatted information for words, lines, forms or tables. + +For example to print the tables identified by Amazon Textract to stdout, use + +```bash +> amazon-textract --example --features TABLES --pretty-print TABLES +``` + +Output will look like this: + +```text +|------------|-----------|---------------------|-----------------|-----------------------| +| | | Previous Employment | History | | +| Start Date | End Date | Employer Name | Position Held | Reason for leaving | +| 1/15/2009 | 6/30/2011 | Any Company | Assistant Baker | Family relocated | +| 7/1/2011 | 8/10/2013 | Best Corp. | Baker | Better opportunity | +| 8/15/2013 | present | Example Corp. | Head Baker | N/A, current employer | + +``` + +to pretty print both, FORMS and TABLES: + +```bash +> amazon-textract --example --features FORMS TABLES --pretty-print FORMS TABLES +``` + +will output + +```text +Phone Number:: 555-0100 +Home Address:: 123 Any Street, Any Town, USA +Full Name:: Jane Doe +Mailing Address:: same as home address +|------------|-----------|---------------------|-----------------|-----------------------| +| | | Previous Employment | History | | +| Start Date | End Date | Employer Name | Position Held | Reason for leaving | +| 1/15/2009 | 6/30/2011 | Any Company | Assistant Baker | Family relocated | +| 7/1/2011 | 8/10/2013 | Best Corp. | Baker | Better opportunity | +| 8/15/2013 | present | Example Corp. | Head Baker | N/A, current employer | +``` + +## Overlay + +**At the moment overlay only works with images, we will add support for PDF soon.** + +The following command runs DetectDocumentText, pretty prints the WORDS in the document to stdout and draws bounding boxes around each WORD and displays the result in a popup window and stores it to a folder called 'overlay-output-folder-name'. + +```bash +amazon-textract --example --pretty-print WORDS --overlay WORD --pop-up-overlay-output --overlay-output-folder overlay-output-folder-name +``` + +<img src="https://github.com/aws-samples/amazon-textract-textractor/blob/master/helper/docs/employmentapp_boxed_WORD_.png" alt="Sample overlay WORD" width="50%" height="50%" border="1"> + + +The following command runs AnalyzeDocument for FORMS and TABLES, pretty prints FORMS and TABLES to to stdout and draws bounding boxes around each TABLE-CELL and FORM KEY/VALUE and displays the result in a popup window and stores it to a folder called 'overlay-output-folder-name'. + +```bash +> amazon-textract --example --features TABLES FORMS --pretty-print FORMS TABLES --overlay FORM CELL --pop-up-overlay-output --overlay-output-folder ../mywonderfuloutputfolderfordocs/ +``` + + +<img src="https://github.com/aws-samples/amazon-textract-textractor/blob/master/helper/docs/employmentapp_boxed_FORM_CELL_.png" alt="Sample overlay FORM CELL" width="50%" height="50%" border="1"> + + +The following command draws bounding boxes around each WORD, overlays the detected WORD text, and displays the result in a popup window and stores it to a folder called 'overlay-output-folder-name'. + +```bash +> amazon-textract --example --overlay WORD --overlay-text --pop-up-overlay-output --overlay-output-folder overlay-output-folder-name +``` + + +<img src="https://github.com/aws-samples/amazon-textract-textractor/blob/master/helper/docs/employmentapp_boxed_WORD_TEXT_OVERLAY.png" alt="Sample overlay LINE with overlay text and confidence percentage" width="50%" height="50%" border="1"> + + +The following command draws bounding boxes around each LINE, overlays LINE text along with percentage confidence of the detected LINE text, and displays the result in a popup window and stores it to a folder called 'overlay-output-folder-name'. + +```bash +> amazon-textract --example --overlay LINE --overlay-text --overlay-confidence --pop-up-overlay-output --overlay-output-folder overlay-output-folder-name +``` + + +<img src="https://github.com/aws-samples/amazon-textract-textractor/blob/master/helper/docs/employmentapp_boxed_LINE_TEXT_OVERLAY.png" alt="Sample overlay LINE with overlay text and confidence percentage" width="50%" height="50%" border="1"> + + + +%package help +Summary: Development documents and examples for amazon-textract-helper +Provides: python3-amazon-textract-helper-doc +%description help +# Textractor-Textract-Helper + +amazon-textract-helper provides a collection of ready to use functions and sample implementations to speed up the evaluation and development for any project using Amazon Textract. +It installs a command line tool called ```amazon-textract``` + + +# Install + +```bash +> python -m pip install amazon-textract-helper +``` + +Make sure your environment is setup with AWS credentials through configuration files or environment variables or an attached role. (https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) + +# Test + +```bash +> amazon-textract --help +usage: amazon-textract [-h] (--input-document INPUT_DOCUMENT | --example | --stdin) [--features {FORMS,TABLES} [{FORMS,TABLES} ...]] + [--pretty-print {WORDS,LINES,FORMS,TABLES} [{WORDS,LINES,FORMS,TABLES} ...]] + [--pretty-print-table-format {csv,plain,simple,github,grid,fancy_grid,pipe,orgtbl,jira,presto,pretty,psql,rst,medi +awiki,moinmoin,youtrack,html,unsafehtml,latex,latex_raw,latex_booktabs,latex_longtable,textile,tsv}] + [--overlay {WORD,LINE,FORM,KEY,VALUE,TABLE,CELL} [{WORD,LINE,FORM,KEY,VALUE,TABLE,CELL} ...]] + [--pop-up-overlay-output] [--overlay-output-folder OVERLAY_OUTPUT_FOLDER] [--version] [--no-stdout] [-v | -vv] + +optional arguments: + -h, --help show this help message and exit + --input-document INPUT_DOCUMENT + s3 object (s3://) or file from local filesystem + --example using the example document to call Textract + --stdin receive JSON from stdin + --features {FORMS,TABLES} [{FORMS,TABLES} ...] + features to call Textract with. Will trigger call to AnalyzeDocument instead of DetectDocumentText + --pretty-print {WORDS,LINES,FORMS,TABLES} [{WORDS,LINES,FORMS,TABLES} ...] + --pretty-print-table-format {csv,plain,simple,github,grid,fancy_grid,pipe,orgtbl,jira,presto,pretty,psql,rst,mediawiki,moinmoin,youtrac +k,html,unsafehtml,latex,latex_raw,latex_booktabs,latex_longtable,textile,tsv} + which format to output the pretty print information to. Only effects FORMS and TABLES + --overlay {WORD,LINE,FORM,KEY,VALUE,TABLE,CELL} [{WORD,LINE,FORM,KEY,VALUE,TABLE,CELL} ...] + defines what bounding boxes to draw on the output + --pop-up-overlay-output + shows image with overlay + --overlay-text shows image with WORD or LINE text overlay. When both WORD and LINE overlay are specified, WORD text will be overlayed + --overlay-confidence shows image with confidence overlay + --overlay-output-folder OVERLAY_OUTPUT_FOLDER + output with bounding boxes to folder + --version print version information + --no-stdout no output to stdout + -v >=INFO level logging output to stderr + -vv >=DEBUG level logging output to stderr +``` + +# Sample Commands + +## Easy Start + +```bash +> amazon-textract --example +``` + +this will run the examples document using the DetectDocumentText API. +Output will be printed to stdout and look similar to this: + +```json +{"DocumentMetadata": {"Pages": 1}, "Blocks": [{"BlockType": "PAGE", "Geometry": {"BoundingBox": {"Width": 1.0, "Height": 1.0, "Left": 0.0 +, "Top": 0.0}, "Polygon": [{"X": 9.33321120033382e-17, "Y": 0.0}, {"X": 1.0, "Y": 1.6069064689339292e-16}, {"X": 1.0, "Y": 1.0}], +"HTTPHeaders": {"x-amzn-requestid": "12345678-1234-1234-1234-123456789012", "content-type": "a +pplication/x-amz-json-1.1", "content-length": "48177", "date": "Thu, 01 Apr 2021 21:50:29 GMT"}, "RetryAttempts": 0}} +``` + +It is working. + +## Call with document on S3 + +```bash +> amazon-textract --input-document "s3://somebucket/someprefix/someobjectname.png" +``` + +Output similar to Easy Start + +## Call with document on local file system + +```bash +> amazon-textract --input-document "./somepath/somefilename.png" +``` + +Output similar to Easy Start + +We will continue to use the ```--example``` parameter to keep it simple and easy to reproduce. S3 and local files work the same way, just instead of --example use --input-document <location>. + +## Call with STDIN + +```bash +# first create JSON +amazon-textract --example > example.json +# now use a stored JSON with the ```amazon-textract``` command +cat example.json | amazon-textract --stdin -pretty-print LINES +``` + +## Call with FORMS and TABLES + +```bash +> amazon-textract --example --features FORMS TABLES +``` + +This will call the [AnalyzeDocument API] (https://docs.aws.amazon.com/textract/latest/dg/API_AnalyzeDocument.html) and output will include +Output will look similar to "Easy Start" but include FORMS and TABLES information + +## Pretty print the output + +Pretty print outputs nicely formatted information for words, lines, forms or tables. + +For example to print the tables identified by Amazon Textract to stdout, use + +```bash +> amazon-textract --example --features TABLES --pretty-print TABLES +``` + +Output will look like this: + +```text +|------------|-----------|---------------------|-----------------|-----------------------| +| | | Previous Employment | History | | +| Start Date | End Date | Employer Name | Position Held | Reason for leaving | +| 1/15/2009 | 6/30/2011 | Any Company | Assistant Baker | Family relocated | +| 7/1/2011 | 8/10/2013 | Best Corp. | Baker | Better opportunity | +| 8/15/2013 | present | Example Corp. | Head Baker | N/A, current employer | + +``` + +to pretty print both, FORMS and TABLES: + +```bash +> amazon-textract --example --features FORMS TABLES --pretty-print FORMS TABLES +``` + +will output + +```text +Phone Number:: 555-0100 +Home Address:: 123 Any Street, Any Town, USA +Full Name:: Jane Doe +Mailing Address:: same as home address +|------------|-----------|---------------------|-----------------|-----------------------| +| | | Previous Employment | History | | +| Start Date | End Date | Employer Name | Position Held | Reason for leaving | +| 1/15/2009 | 6/30/2011 | Any Company | Assistant Baker | Family relocated | +| 7/1/2011 | 8/10/2013 | Best Corp. | Baker | Better opportunity | +| 8/15/2013 | present | Example Corp. | Head Baker | N/A, current employer | +``` + +## Overlay + +**At the moment overlay only works with images, we will add support for PDF soon.** + +The following command runs DetectDocumentText, pretty prints the WORDS in the document to stdout and draws bounding boxes around each WORD and displays the result in a popup window and stores it to a folder called 'overlay-output-folder-name'. + +```bash +amazon-textract --example --pretty-print WORDS --overlay WORD --pop-up-overlay-output --overlay-output-folder overlay-output-folder-name +``` + +<img src="https://github.com/aws-samples/amazon-textract-textractor/blob/master/helper/docs/employmentapp_boxed_WORD_.png" alt="Sample overlay WORD" width="50%" height="50%" border="1"> + + +The following command runs AnalyzeDocument for FORMS and TABLES, pretty prints FORMS and TABLES to to stdout and draws bounding boxes around each TABLE-CELL and FORM KEY/VALUE and displays the result in a popup window and stores it to a folder called 'overlay-output-folder-name'. + +```bash +> amazon-textract --example --features TABLES FORMS --pretty-print FORMS TABLES --overlay FORM CELL --pop-up-overlay-output --overlay-output-folder ../mywonderfuloutputfolderfordocs/ +``` + + +<img src="https://github.com/aws-samples/amazon-textract-textractor/blob/master/helper/docs/employmentapp_boxed_FORM_CELL_.png" alt="Sample overlay FORM CELL" width="50%" height="50%" border="1"> + + +The following command draws bounding boxes around each WORD, overlays the detected WORD text, and displays the result in a popup window and stores it to a folder called 'overlay-output-folder-name'. + +```bash +> amazon-textract --example --overlay WORD --overlay-text --pop-up-overlay-output --overlay-output-folder overlay-output-folder-name +``` + + +<img src="https://github.com/aws-samples/amazon-textract-textractor/blob/master/helper/docs/employmentapp_boxed_WORD_TEXT_OVERLAY.png" alt="Sample overlay LINE with overlay text and confidence percentage" width="50%" height="50%" border="1"> + + +The following command draws bounding boxes around each LINE, overlays LINE text along with percentage confidence of the detected LINE text, and displays the result in a popup window and stores it to a folder called 'overlay-output-folder-name'. + +```bash +> amazon-textract --example --overlay LINE --overlay-text --overlay-confidence --pop-up-overlay-output --overlay-output-folder overlay-output-folder-name +``` + + +<img src="https://github.com/aws-samples/amazon-textract-textractor/blob/master/helper/docs/employmentapp_boxed_LINE_TEXT_OVERLAY.png" alt="Sample overlay LINE with overlay text and confidence percentage" width="50%" height="50%" border="1"> + + + +%prep +%autosetup -n amazon-textract-helper-0.0.34 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-amazon-textract-helper -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Mon May 15 2023 Python_Bot <Python_Bot@openeuler.org> - 0.0.34-1 +- Package Spec generated @@ -0,0 +1 @@ +720e0c787e2981489d7be03ca7bf320c amazon-textract-helper-0.0.34.tar.gz |
