summaryrefslogtreecommitdiff
path: root/python-textpipe.spec
diff options
context:
space:
mode:
Diffstat (limited to 'python-textpipe.spec')
-rw-r--r--python-textpipe.spec642
1 files changed, 642 insertions, 0 deletions
diff --git a/python-textpipe.spec b/python-textpipe.spec
new file mode 100644
index 0000000..1150c37
--- /dev/null
+++ b/python-textpipe.spec
@@ -0,0 +1,642 @@
+%global _empty_manifest_terminate_build 0
+Name: python-textpipe
+Version: 0.12.2
+Release: 1
+Summary: textpipe: clean and extract metadata from text
+License: MIT License
+URL: https://github.com/textpipe/textpipe
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/65/b1/5b5544ce361dd1c440f0538b5ee0821a5a3d1c74983bd742d23e720c22ab/textpipe-0.12.2.tar.gz
+BuildArch: noarch
+
+
+%description
+# textpipe: clean and extract metadata from text
+
+[![Build Status](https://travis-ci.com/textpipe/textpipe.svg?branch=master)](https://travis-ci.com/textpipe/textpipe)
+
+![The textpipe logo](https://avatars3.githubusercontent.com/u/40492530?s=400&u=c65c2c8274cbdcd05b1942d1963d7aa2800e6d7f&v=4)
+
+`textpipe` is a Python package for converting raw text in to clean, readable text and
+extracting metadata from that text. Its functionalities include transforming
+raw text into readable text by removing HTML tags and extracting
+metadata such as the number of words and named entities from the text.
+
+## Vision: the zen of textpipe
+
+- Designed for use in production pipelines without adult supervision.
+- Rechargeable batteries included: provide sane defaults and clear examples to adapt.
+- A uniform interface with thin wrappers around state-of-the-art NLP packages.
+- As language-agnostic as possible.
+- Bring your own models.
+
+## Features
+
+- Clean raw text by removing `HTML` and other unreadable constructs
+- Identify the language of text
+- Extract the number of words, number of sentences, named entities from a text
+- Calculate the complexity of a text
+- Obtain text metadata by specifying a pipeline containing all desired elements
+- Obtain sentiment (polarity and a subjectivity score)
+- Generates word counts
+- Computes minhash for cheap similarity estimation of documents
+
+## Installation
+
+It is recommended that you install textpipe using a virtual environment.
+
+- First, create your virtual environment using [virtualenv](https://virtualenv.pypa.io/en/stable/) or [virtualenvwrapper](https://virtualenvwrapper.readthedocs.io/en/latest/).
+
+- Using Venv if your default interpreter is python3.6
+
+```bash
+python3 -m venv .venv
+```
+
+- Using virtualenv.
+
+```bash
+virtualenv venv -p python3.6
+```
+
+- Using virtualenvwrapper
+
+```bash
+mkvirtualenv textpipe -p python3.6
+```
+
+- Install textpipe using pip.
+
+```bash
+pip install textpipe
+```
+
+- Install the required packages using *requirements.txt*.
+
+```bash
+pip install -r requirements.txt
+```
+
+### A note on spaCy download model requirement
+
+While the requirements.txt file that comes with the package calls for spaCy's en_core_web_sm model, this can be changed depending on the model and language you require for your intended use. See [spaCy.io's page on their different models](https://spacy.io/usage/models) for more information.
+
+## Usage example
+
+```python
+>>> from textpipe import doc, pipeline
+>>> sample_text = 'Sample text! <!DOCTYPE>'
+>>> document = doc.Doc(sample_text)
+>>> print(document.clean)
+'Sample text!'
+>>> print(document.language)
+'en'
+>>> print(document.nwords)
+2
+
+>>> pipe = pipeline.Pipeline(['CleanText', 'NWords'])
+>>> print(pipe(sample_text))
+{'CleanText': 'Sample text!', 'NWords': 3}
+```
+
+In order to extend the existing Textpipe operations with your own proprietary operations;
+
+```python
+test_pipe = pipeline.Pipeline(['CleanText', 'NWords'])
+def custom_op(doc, context=None, settings=None, **kwargs):
+ return 1
+
+custom_argument = {'argument' :1 }
+test_pipe.register_operation('CUSTOM_STEP', custom_op)
+test_pipe.steps.append(('CUSTOM_STEP', custom_argument ))
+
+```
+
+## Contributing
+
+See [CONTRIBUTING](CONTRIBUTING.md) for guidelines for contributors.
+
+## Changes
+
+0.12.1
+
+- Bumps redis, tqdm, pyling
+
+
+0.12.0
+
+- Bumps versions of many dependencies including textacy. Results for keyterm extraction changed.
+
+0.11.9
+
+- Exposes arbitrary SpaCy `ents` properties
+
+0.11.8
+
+- Exposes SpaCy's `cats` attribute
+
+0.11.7
+
+- Bumps spaCy and redis versions
+
+0.11.6
+
+- Fixes bug where gensim model is not cached in pipeline
+
+0.11.5
+
+- Raise TextpipeMissingModelException instead of KeyError
+
+0.11.4
+
+- Bumps spaCy and datasketch dependencies
+
+0.11.1
+
+- Replaces codacy with pylint on CI
+- Fixes pylint issues
+
+0.11.0
+
+- Adds wrapper around Gensim keyed vectors to construct document embeddings from Redis cache
+
+0.9.0
+
+- Adds functionality to compute document embeddings using a Gensim word2vec model
+
+0.8.6
+
+- Removes non standard utf chars before detecting language
+
+0.8.5
+
+- Bump spaCy to 2.1.3
+
+0.8.4
+
+- Fix broken install command
+
+0.8.3
+
+- Fix broken install command
+
+0.8.2
+
+- Fix copy-paste error in word vector aggregation ([#118](https://github.com/textpipe/textpipe/issues/118))
+
+0.8.1
+
+- Fixes bugs in several operations that didn't accept kwargs
+
+0.8.0
+
+- Bumps Spacy to 2.1
+
+0.7.2
+
+- Pins Spacy and Pattern versions (with pinned lxml)
+
+0.7.0
+
+- change operation's registry from list to dict
+- global pipeline data is available across operations via the `context` kwarg
+- load custom operations using `register_operation` in pipeline
+- custom steps (operations) with arguments
+
+%package -n python3-textpipe
+Summary: textpipe: clean and extract metadata from text
+Provides: python-textpipe
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-textpipe
+# textpipe: clean and extract metadata from text
+
+[![Build Status](https://travis-ci.com/textpipe/textpipe.svg?branch=master)](https://travis-ci.com/textpipe/textpipe)
+
+![The textpipe logo](https://avatars3.githubusercontent.com/u/40492530?s=400&u=c65c2c8274cbdcd05b1942d1963d7aa2800e6d7f&v=4)
+
+`textpipe` is a Python package for converting raw text in to clean, readable text and
+extracting metadata from that text. Its functionalities include transforming
+raw text into readable text by removing HTML tags and extracting
+metadata such as the number of words and named entities from the text.
+
+## Vision: the zen of textpipe
+
+- Designed for use in production pipelines without adult supervision.
+- Rechargeable batteries included: provide sane defaults and clear examples to adapt.
+- A uniform interface with thin wrappers around state-of-the-art NLP packages.
+- As language-agnostic as possible.
+- Bring your own models.
+
+## Features
+
+- Clean raw text by removing `HTML` and other unreadable constructs
+- Identify the language of text
+- Extract the number of words, number of sentences, named entities from a text
+- Calculate the complexity of a text
+- Obtain text metadata by specifying a pipeline containing all desired elements
+- Obtain sentiment (polarity and a subjectivity score)
+- Generates word counts
+- Computes minhash for cheap similarity estimation of documents
+
+## Installation
+
+It is recommended that you install textpipe using a virtual environment.
+
+- First, create your virtual environment using [virtualenv](https://virtualenv.pypa.io/en/stable/) or [virtualenvwrapper](https://virtualenvwrapper.readthedocs.io/en/latest/).
+
+- Using Venv if your default interpreter is python3.6
+
+```bash
+python3 -m venv .venv
+```
+
+- Using virtualenv.
+
+```bash
+virtualenv venv -p python3.6
+```
+
+- Using virtualenvwrapper
+
+```bash
+mkvirtualenv textpipe -p python3.6
+```
+
+- Install textpipe using pip.
+
+```bash
+pip install textpipe
+```
+
+- Install the required packages using *requirements.txt*.
+
+```bash
+pip install -r requirements.txt
+```
+
+### A note on spaCy download model requirement
+
+While the requirements.txt file that comes with the package calls for spaCy's en_core_web_sm model, this can be changed depending on the model and language you require for your intended use. See [spaCy.io's page on their different models](https://spacy.io/usage/models) for more information.
+
+## Usage example
+
+```python
+>>> from textpipe import doc, pipeline
+>>> sample_text = 'Sample text! <!DOCTYPE>'
+>>> document = doc.Doc(sample_text)
+>>> print(document.clean)
+'Sample text!'
+>>> print(document.language)
+'en'
+>>> print(document.nwords)
+2
+
+>>> pipe = pipeline.Pipeline(['CleanText', 'NWords'])
+>>> print(pipe(sample_text))
+{'CleanText': 'Sample text!', 'NWords': 3}
+```
+
+In order to extend the existing Textpipe operations with your own proprietary operations;
+
+```python
+test_pipe = pipeline.Pipeline(['CleanText', 'NWords'])
+def custom_op(doc, context=None, settings=None, **kwargs):
+ return 1
+
+custom_argument = {'argument' :1 }
+test_pipe.register_operation('CUSTOM_STEP', custom_op)
+test_pipe.steps.append(('CUSTOM_STEP', custom_argument ))
+
+```
+
+## Contributing
+
+See [CONTRIBUTING](CONTRIBUTING.md) for guidelines for contributors.
+
+## Changes
+
+0.12.1
+
+- Bumps redis, tqdm, pyling
+
+
+0.12.0
+
+- Bumps versions of many dependencies including textacy. Results for keyterm extraction changed.
+
+0.11.9
+
+- Exposes arbitrary SpaCy `ents` properties
+
+0.11.8
+
+- Exposes SpaCy's `cats` attribute
+
+0.11.7
+
+- Bumps spaCy and redis versions
+
+0.11.6
+
+- Fixes bug where gensim model is not cached in pipeline
+
+0.11.5
+
+- Raise TextpipeMissingModelException instead of KeyError
+
+0.11.4
+
+- Bumps spaCy and datasketch dependencies
+
+0.11.1
+
+- Replaces codacy with pylint on CI
+- Fixes pylint issues
+
+0.11.0
+
+- Adds wrapper around Gensim keyed vectors to construct document embeddings from Redis cache
+
+0.9.0
+
+- Adds functionality to compute document embeddings using a Gensim word2vec model
+
+0.8.6
+
+- Removes non standard utf chars before detecting language
+
+0.8.5
+
+- Bump spaCy to 2.1.3
+
+0.8.4
+
+- Fix broken install command
+
+0.8.3
+
+- Fix broken install command
+
+0.8.2
+
+- Fix copy-paste error in word vector aggregation ([#118](https://github.com/textpipe/textpipe/issues/118))
+
+0.8.1
+
+- Fixes bugs in several operations that didn't accept kwargs
+
+0.8.0
+
+- Bumps Spacy to 2.1
+
+0.7.2
+
+- Pins Spacy and Pattern versions (with pinned lxml)
+
+0.7.0
+
+- change operation's registry from list to dict
+- global pipeline data is available across operations via the `context` kwarg
+- load custom operations using `register_operation` in pipeline
+- custom steps (operations) with arguments
+
+%package help
+Summary: Development documents and examples for textpipe
+Provides: python3-textpipe-doc
+%description help
+# textpipe: clean and extract metadata from text
+
+[![Build Status](https://travis-ci.com/textpipe/textpipe.svg?branch=master)](https://travis-ci.com/textpipe/textpipe)
+
+![The textpipe logo](https://avatars3.githubusercontent.com/u/40492530?s=400&u=c65c2c8274cbdcd05b1942d1963d7aa2800e6d7f&v=4)
+
+`textpipe` is a Python package for converting raw text in to clean, readable text and
+extracting metadata from that text. Its functionalities include transforming
+raw text into readable text by removing HTML tags and extracting
+metadata such as the number of words and named entities from the text.
+
+## Vision: the zen of textpipe
+
+- Designed for use in production pipelines without adult supervision.
+- Rechargeable batteries included: provide sane defaults and clear examples to adapt.
+- A uniform interface with thin wrappers around state-of-the-art NLP packages.
+- As language-agnostic as possible.
+- Bring your own models.
+
+## Features
+
+- Clean raw text by removing `HTML` and other unreadable constructs
+- Identify the language of text
+- Extract the number of words, number of sentences, named entities from a text
+- Calculate the complexity of a text
+- Obtain text metadata by specifying a pipeline containing all desired elements
+- Obtain sentiment (polarity and a subjectivity score)
+- Generates word counts
+- Computes minhash for cheap similarity estimation of documents
+
+## Installation
+
+It is recommended that you install textpipe using a virtual environment.
+
+- First, create your virtual environment using [virtualenv](https://virtualenv.pypa.io/en/stable/) or [virtualenvwrapper](https://virtualenvwrapper.readthedocs.io/en/latest/).
+
+- Using Venv if your default interpreter is python3.6
+
+```bash
+python3 -m venv .venv
+```
+
+- Using virtualenv.
+
+```bash
+virtualenv venv -p python3.6
+```
+
+- Using virtualenvwrapper
+
+```bash
+mkvirtualenv textpipe -p python3.6
+```
+
+- Install textpipe using pip.
+
+```bash
+pip install textpipe
+```
+
+- Install the required packages using *requirements.txt*.
+
+```bash
+pip install -r requirements.txt
+```
+
+### A note on spaCy download model requirement
+
+While the requirements.txt file that comes with the package calls for spaCy's en_core_web_sm model, this can be changed depending on the model and language you require for your intended use. See [spaCy.io's page on their different models](https://spacy.io/usage/models) for more information.
+
+## Usage example
+
+```python
+>>> from textpipe import doc, pipeline
+>>> sample_text = 'Sample text! <!DOCTYPE>'
+>>> document = doc.Doc(sample_text)
+>>> print(document.clean)
+'Sample text!'
+>>> print(document.language)
+'en'
+>>> print(document.nwords)
+2
+
+>>> pipe = pipeline.Pipeline(['CleanText', 'NWords'])
+>>> print(pipe(sample_text))
+{'CleanText': 'Sample text!', 'NWords': 3}
+```
+
+In order to extend the existing Textpipe operations with your own proprietary operations;
+
+```python
+test_pipe = pipeline.Pipeline(['CleanText', 'NWords'])
+def custom_op(doc, context=None, settings=None, **kwargs):
+ return 1
+
+custom_argument = {'argument' :1 }
+test_pipe.register_operation('CUSTOM_STEP', custom_op)
+test_pipe.steps.append(('CUSTOM_STEP', custom_argument ))
+
+```
+
+## Contributing
+
+See [CONTRIBUTING](CONTRIBUTING.md) for guidelines for contributors.
+
+## Changes
+
+0.12.1
+
+- Bumps redis, tqdm, pyling
+
+
+0.12.0
+
+- Bumps versions of many dependencies including textacy. Results for keyterm extraction changed.
+
+0.11.9
+
+- Exposes arbitrary SpaCy `ents` properties
+
+0.11.8
+
+- Exposes SpaCy's `cats` attribute
+
+0.11.7
+
+- Bumps spaCy and redis versions
+
+0.11.6
+
+- Fixes bug where gensim model is not cached in pipeline
+
+0.11.5
+
+- Raise TextpipeMissingModelException instead of KeyError
+
+0.11.4
+
+- Bumps spaCy and datasketch dependencies
+
+0.11.1
+
+- Replaces codacy with pylint on CI
+- Fixes pylint issues
+
+0.11.0
+
+- Adds wrapper around Gensim keyed vectors to construct document embeddings from Redis cache
+
+0.9.0
+
+- Adds functionality to compute document embeddings using a Gensim word2vec model
+
+0.8.6
+
+- Removes non standard utf chars before detecting language
+
+0.8.5
+
+- Bump spaCy to 2.1.3
+
+0.8.4
+
+- Fix broken install command
+
+0.8.3
+
+- Fix broken install command
+
+0.8.2
+
+- Fix copy-paste error in word vector aggregation ([#118](https://github.com/textpipe/textpipe/issues/118))
+
+0.8.1
+
+- Fixes bugs in several operations that didn't accept kwargs
+
+0.8.0
+
+- Bumps Spacy to 2.1
+
+0.7.2
+
+- Pins Spacy and Pattern versions (with pinned lxml)
+
+0.7.0
+
+- change operation's registry from list to dict
+- global pipeline data is available across operations via the `context` kwarg
+- load custom operations using `register_operation` in pipeline
+- custom steps (operations) with arguments
+
+%prep
+%autosetup -n textpipe-0.12.2
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-textpipe -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Mon May 29 2023 Python_Bot <Python_Bot@openeuler.org> - 0.12.2-1
+- Package Spec generated