automatic import of python-textspitter

author: CoprDistGit <infra@openeuler.org> 2023-05-29 11:42:31 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-05-29 11:42:31 +0000
commit: 4acdfe2666d264e997058cb3c094b67a99aca162 (patch)
tree: 8b2a31b140794b5c17a9764668440c010169c5a2
parent: 3ec3752bc6c12d54bef58b312901a3ae5c1f0252 (diff)
3 files changed, 208 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..4b9e1e9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/TextSpitter-0.3.6.tar.gz
diff --git a/python-textspitter.spec b/python-textspitter.spec
new file mode 100644
index 0000000..168a7e3
--- /dev/null
+++ b/python-textspitter.spec
@@ -0,0 +1,206 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-TextSpitter
+Version:	0.3.6
+Release:	1
+Summary:	Python package that spits out text from your document files!
+License:	MIT License
+URL:		https://github.com/fsecada01/TextSpitter
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/9a/85/a8f8829ed4b17609fddef9ce240c63b601600c54a81464a488cfc85ec0b8/TextSpitter-0.3.6.tar.gz
+BuildArch:	noarch
+
+Requires:	python3-docx
+Requires:	python3-PyPDF2
+
+%description
+# THANK YOU FOR USING TEXTSPITTER!! #
+
+I created this little app to help me process documents from folder sets and batches.  Instead of trying to determine each file type and process accordingly, I thought it would be more prudent to read file names and then route text extraction functions accordingly.  Also, I was having a really difficult time getting textract/pdftotext to work **because of damn Poppler**.  So instead of troubleshooting that whole process after 6+ hours, I figured this was more time-efficient.
+
+This is my first python module, so I hope I did this well!
+
+## Installation ##
+* Type `pip install TextSpitter`
+* **OPTIONAL** type `pip install PyMuPDF` to install the Python-MuPDF engine for better fidelity with text extraction (i.e.: maintaining correct White Spacing)
+	* You will need to follow instructions to ensure that PyMuPDF's dependencies install to your system.  There are wheels and binaries available for Windows, Linux, and MacOSX, though if you're on something weird like NetBSD/FreeBSD/specialty linux distros, you may e SOL.  Fortunately, CLI options like Yum, Pkgin, Apt-Get and so forth will have packages available straight from the terminal.
+	* For detailed instructions, please visit here: https://github.com/rk700/PyMuPDF and maybe give those guys some kudos, because they worked their tails off.
+
+## Directions ##
+This module is designed to run as simply as possible.  Just provide the file location string data into the argument, and get your text returned to you.
+
+```
+from TextSpitter import TextSpitter as TS
+folder_loc = 'foo/bar/'
+
+docx_file = folder_loc + 'file_thing.docx'
+pdf_file = folder_loc + 'file_thing.pdf'
+text_file = folder_loc + 'file_thing.txt'
+
+doc_tup = (docx_file, pdf_file, text_file)
+
+raw_text_payload = [TS(filename=ele) for ele in doc_tup]
+text = '\n'.join(raw_text_payload)
+return text
+```
+
+## TO DOs ##
+* [x] spruce up documentation
+* [X] Add stream functionality for s3-based file reading
+* [ ] expand functionality to other file types
+* [ ] TDB
+
+## WANT TO CONTRIBUTE!? ##
+_*OH MY GOD, PLEASE DO.*_
+
+Just make a pull request and add whatever you want (or fix whatever you want).  I'll review and approve if everything seems good.  
+
+Thanks, everyone!
+
+
+
+
+%package -n python3-TextSpitter
+Summary:	Python package that spits out text from your document files!
+Provides:	python-TextSpitter
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-TextSpitter
+# THANK YOU FOR USING TEXTSPITTER!! #
+
+I created this little app to help me process documents from folder sets and batches.  Instead of trying to determine each file type and process accordingly, I thought it would be more prudent to read file names and then route text extraction functions accordingly.  Also, I was having a really difficult time getting textract/pdftotext to work **because of damn Poppler**.  So instead of troubleshooting that whole process after 6+ hours, I figured this was more time-efficient.
+
+This is my first python module, so I hope I did this well!
+
+## Installation ##
+* Type `pip install TextSpitter`
+* **OPTIONAL** type `pip install PyMuPDF` to install the Python-MuPDF engine for better fidelity with text extraction (i.e.: maintaining correct White Spacing)
+	* You will need to follow instructions to ensure that PyMuPDF's dependencies install to your system.  There are wheels and binaries available for Windows, Linux, and MacOSX, though if you're on something weird like NetBSD/FreeBSD/specialty linux distros, you may e SOL.  Fortunately, CLI options like Yum, Pkgin, Apt-Get and so forth will have packages available straight from the terminal.
+	* For detailed instructions, please visit here: https://github.com/rk700/PyMuPDF and maybe give those guys some kudos, because they worked their tails off.
+
+## Directions ##
+This module is designed to run as simply as possible.  Just provide the file location string data into the argument, and get your text returned to you.
+
+```
+from TextSpitter import TextSpitter as TS
+folder_loc = 'foo/bar/'
+
+docx_file = folder_loc + 'file_thing.docx'
+pdf_file = folder_loc + 'file_thing.pdf'
+text_file = folder_loc + 'file_thing.txt'
+
+doc_tup = (docx_file, pdf_file, text_file)
+
+raw_text_payload = [TS(filename=ele) for ele in doc_tup]
+text = '\n'.join(raw_text_payload)
+return text
+```
+
+## TO DOs ##
+* [x] spruce up documentation
+* [X] Add stream functionality for s3-based file reading
+* [ ] expand functionality to other file types
+* [ ] TDB
+
+## WANT TO CONTRIBUTE!? ##
+_*OH MY GOD, PLEASE DO.*_
+
+Just make a pull request and add whatever you want (or fix whatever you want).  I'll review and approve if everything seems good.  
+
+Thanks, everyone!
+
+
+
+
+%package help
+Summary:	Development documents and examples for TextSpitter
+Provides:	python3-TextSpitter-doc
+%description help
+# THANK YOU FOR USING TEXTSPITTER!! #
+
+I created this little app to help me process documents from folder sets and batches.  Instead of trying to determine each file type and process accordingly, I thought it would be more prudent to read file names and then route text extraction functions accordingly.  Also, I was having a really difficult time getting textract/pdftotext to work **because of damn Poppler**.  So instead of troubleshooting that whole process after 6+ hours, I figured this was more time-efficient.
+
+This is my first python module, so I hope I did this well!
+
+## Installation ##
+* Type `pip install TextSpitter`
+* **OPTIONAL** type `pip install PyMuPDF` to install the Python-MuPDF engine for better fidelity with text extraction (i.e.: maintaining correct White Spacing)
+	* You will need to follow instructions to ensure that PyMuPDF's dependencies install to your system.  There are wheels and binaries available for Windows, Linux, and MacOSX, though if you're on something weird like NetBSD/FreeBSD/specialty linux distros, you may e SOL.  Fortunately, CLI options like Yum, Pkgin, Apt-Get and so forth will have packages available straight from the terminal.
+	* For detailed instructions, please visit here: https://github.com/rk700/PyMuPDF and maybe give those guys some kudos, because they worked their tails off.
+
+## Directions ##
+This module is designed to run as simply as possible.  Just provide the file location string data into the argument, and get your text returned to you.
+
+```
+from TextSpitter import TextSpitter as TS
+folder_loc = 'foo/bar/'
+
+docx_file = folder_loc + 'file_thing.docx'
+pdf_file = folder_loc + 'file_thing.pdf'
+text_file = folder_loc + 'file_thing.txt'
+
+doc_tup = (docx_file, pdf_file, text_file)
+
+raw_text_payload = [TS(filename=ele) for ele in doc_tup]
+text = '\n'.join(raw_text_payload)
+return text
+```
+
+## TO DOs ##
+* [x] spruce up documentation
+* [X] Add stream functionality for s3-based file reading
+* [ ] expand functionality to other file types
+* [ ] TDB
+
+## WANT TO CONTRIBUTE!? ##
+_*OH MY GOD, PLEASE DO.*_
+
+Just make a pull request and add whatever you want (or fix whatever you want).  I'll review and approve if everything seems good.  
+
+Thanks, everyone!
+
+
+
+
+%prep
+%autosetup -n TextSpitter-0.3.6
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-TextSpitter -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Mon May 29 2023 Python_Bot <Python_Bot@openeuler.org> - 0.3.6-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..4a36fd2
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+af1db993e8f8e51d5207973eb613b8ba  TextSpitter-0.3.6.tar.gz
author	CoprDistGit <infra@openeuler.org>	2023-05-29 11:42:31 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-05-29 11:42:31 +0000
commit	4acdfe2666d264e997058cb3c094b67a99aca162 (patch)
tree	8b2a31b140794b5c17a9764668440c010169c5a2
parent	3ec3752bc6c12d54bef58b312901a3ae5c1f0252 (diff)