summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--.gitignore1
-rw-r--r--python-word-forms.spec488
-rw-r--r--sources1
3 files changed, 490 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..656040d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/word_forms-2.1.0.tar.gz
diff --git a/python-word-forms.spec b/python-word-forms.spec
new file mode 100644
index 0000000..8d167e8
--- /dev/null
+++ b/python-word-forms.spec
@@ -0,0 +1,488 @@
+%global _empty_manifest_terminate_build 0
+Name: python-word-forms
+Version: 2.1.0
+Release: 1
+Summary: Generate all possible forms of an English word.
+License: MIT License
+URL: https://github.com/gutfeeling/word_forms
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/31/39/e0f24b7c3f228561b346ae8c046817ff3d3929d77b0c3ca14a12e4d106b2/word_forms-2.1.0.tar.gz
+BuildArch: noarch
+
+Requires: python3-inflect
+Requires: python3-nltk
+
+%description
+<img src="https://github.com/gutfeeling/word_forms/raw/master/logo.png" alt="word forms logo" width="500">
+
+## Accurately generate all possible forms of an English word
+
+Word forms can accurately generate all possible forms of an English word. It can conjugate verbs. It can connect different
+parts of speeches e.g noun to adjective, adjective to adverb, noun to verb etc. It can pluralize singular nouns. It does this all in one function. Enjoy!
+
+## Examples
+
+Some very timely examples :-P
+
+```python
+>>> from word_forms.word_forms import get_word_forms
+>>> get_word_forms("president")
+>>> {'n': {'presidents', 'presidentships', 'presidencies', 'presidentship', 'president', 'presidency'},
+ 'a': {'presidential'},
+ 'v': {'preside', 'presided', 'presiding', 'presides'},
+ 'r': {'presidentially'}}
+>>> get_word_forms("elect")
+>>> {'n': {'elects', 'electives', 'electors', 'elect', 'eligibilities', 'electorates', 'eligibility', 'elector', 'election', 'elections', 'electorate', 'elective'},
+ 'a': {'eligible', 'electoral', 'elective', 'elect'},
+ 'v': {'electing', 'elects', 'elected', 'elect'},
+ 'r': set()}
+>>> get_word_forms("politician")
+>>> {'n': {'politician', 'politics', 'politicians'},
+ 'a': {'political'},
+ 'v': set(),
+ 'r': {'politically'}}
+>>> get_word_forms("am")
+>>> {'n': {'being', 'beings'},
+ 'a': set(),
+ 'v': {'was', 'be', "weren't", 'am', "wasn't", "aren't", 'being', 'were', 'is', "isn't", 'been', 'are', 'am not'},
+ 'r': set()}
+>>> get_word_forms("ran")
+>>> {'n': {'run', 'runniness', 'runner', 'runninesses', 'running', 'runners', 'runnings', 'runs'},
+ 'a': {'running', 'runny'},
+ 'v': {'running', 'run', 'ran', 'runs'},
+ 'r': set()}
+>>> get_word_forms('continent', 0.8) # with configurable similarity threshold
+>>> {'n': {'continents', 'continency', 'continences', 'continent', 'continencies', 'continence'},
+ 'a': {'continental', 'continent'},
+ 'v': set(),
+ 'r': set()}
+```
+As you can see, the output is a dictionary with four keys. "r" stands for adverb, "a" for adjective, "n" for noun
+and "v" for verb. Don't ask me why "r" stands for adverb. This is what WordNet uses, so this is why I use it too :-)
+
+Help can be obtained at any time by typing the following:
+
+```python
+>>> help(get_word_forms)
+```
+
+## Why?
+In Natural Language Processing and Search, one often needs to treat words like "run" and "ran", "love" and "lovable"
+or "politician" and "politics" as the same word. This is usually done by algorithmically reducing each word into a
+base word and then comparing the base words. The process is called Stemming.
+For example, the [Porter Stemmer](http://text-processing.com/demo/stem/) reduces both "love" and "lovely"
+into the base word "love".
+
+Stemmers have several shortcomings. Firstly, the base word produced by the Stemmer is not always a valid English word.
+For example, the Porter Stemmer reduces the word "operation" to "oper". Secondly, the Stemmers have a high false negative rate.
+For example, "run" is reduced to "run" and "ran" is reduced to "ran". This happens because the Stemmers use a set of
+rational rules for finding the base words, and as we all know, the English language does not always behave very rationally.
+
+Lemmatizers are more accurate than Stemmers because they produce a base form that is present in the dictionary (also called the Lemma). So the reduced word is always a valid English word. However, Lemmatizers also have false negatives because they are not very good at connecting words across different parts of speeches. The [WordNet Lemmatizer](http://textanalysisonline.com/nltk-wordnet-lemmatizer) included with NLTK fails at almost all such examples. "operations" is reduced to "operation" and "operate" is reduced to "operate".
+
+Word Forms tries to solve this problem by finding all possible forms of a given English word. It can perform verb conjugations, connect noun forms to verb forms, adjective forms, adverb forms, plularize singular forms etc.
+
+## Bonus: A simple lemmatizer
+
+We also offer a very simple lemmatizer based on ``word_forms``. Here is how to use it.
+
+```python
+>>> from word_forms.lemmatizer import lemmatize
+>>> lemmatize("operations")
+'operant'
+>>> lemmatize("operate")
+'operant'
+```
+
+Enjoy!
+
+## Compatibility
+
+Tested on Python 3
+
+## Installation
+
+Using `pip`:
+
+```
+pip install -U word_forms
+```
+
+### From source
+Or you can install it from source:
+
+1. Clone the repository:
+
+```
+git clone https://github.com/gutfeeling/word_forms.git
+```
+
+2. Install it using `pip` or `setup.py`
+
+```
+pip install -e word_forms
+% or
+cd word_forms
+python setup.py install
+```
+
+## Acknowledgement
+
+1. [The XTAG project](http://www.cis.upenn.edu/~xtag/) for information on [verb conjugations](word_forms/en-verbs.txt).
+2. [WordNet](http://wordnet.princeton.edu/)
+
+## Maintainer
+
+Hi, I am Dibya and I maintain this repository. I would love to hear from you. Feel free to get in touch with me
+at dibyachakravorty@gmail.com.
+
+## Contributors
+
+- Tom Aarsen @CubieDev is a major contributor and is singlehandedly responsible for v2.0.0.
+- Sajal Sharma @sajal2692 ia a major contributor.
+
+## Contributions
+
+Word Forms is not perfect. In particular, a couple of aspects can be improved.
+
+1. It sometimes generates non dictionary words like "runninesses" because the pluralization/singularization algorithm is
+not perfect. At the moment, I am using [inflect](https://pypi.python.org/pypi/inflect) for it.
+
+If you like this package, feel free to contribute. Your pull requests are most welcome.
+
+
+
+
+%package -n python3-word-forms
+Summary: Generate all possible forms of an English word.
+Provides: python-word-forms
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-word-forms
+<img src="https://github.com/gutfeeling/word_forms/raw/master/logo.png" alt="word forms logo" width="500">
+
+## Accurately generate all possible forms of an English word
+
+Word forms can accurately generate all possible forms of an English word. It can conjugate verbs. It can connect different
+parts of speeches e.g noun to adjective, adjective to adverb, noun to verb etc. It can pluralize singular nouns. It does this all in one function. Enjoy!
+
+## Examples
+
+Some very timely examples :-P
+
+```python
+>>> from word_forms.word_forms import get_word_forms
+>>> get_word_forms("president")
+>>> {'n': {'presidents', 'presidentships', 'presidencies', 'presidentship', 'president', 'presidency'},
+ 'a': {'presidential'},
+ 'v': {'preside', 'presided', 'presiding', 'presides'},
+ 'r': {'presidentially'}}
+>>> get_word_forms("elect")
+>>> {'n': {'elects', 'electives', 'electors', 'elect', 'eligibilities', 'electorates', 'eligibility', 'elector', 'election', 'elections', 'electorate', 'elective'},
+ 'a': {'eligible', 'electoral', 'elective', 'elect'},
+ 'v': {'electing', 'elects', 'elected', 'elect'},
+ 'r': set()}
+>>> get_word_forms("politician")
+>>> {'n': {'politician', 'politics', 'politicians'},
+ 'a': {'political'},
+ 'v': set(),
+ 'r': {'politically'}}
+>>> get_word_forms("am")
+>>> {'n': {'being', 'beings'},
+ 'a': set(),
+ 'v': {'was', 'be', "weren't", 'am', "wasn't", "aren't", 'being', 'were', 'is', "isn't", 'been', 'are', 'am not'},
+ 'r': set()}
+>>> get_word_forms("ran")
+>>> {'n': {'run', 'runniness', 'runner', 'runninesses', 'running', 'runners', 'runnings', 'runs'},
+ 'a': {'running', 'runny'},
+ 'v': {'running', 'run', 'ran', 'runs'},
+ 'r': set()}
+>>> get_word_forms('continent', 0.8) # with configurable similarity threshold
+>>> {'n': {'continents', 'continency', 'continences', 'continent', 'continencies', 'continence'},
+ 'a': {'continental', 'continent'},
+ 'v': set(),
+ 'r': set()}
+```
+As you can see, the output is a dictionary with four keys. "r" stands for adverb, "a" for adjective, "n" for noun
+and "v" for verb. Don't ask me why "r" stands for adverb. This is what WordNet uses, so this is why I use it too :-)
+
+Help can be obtained at any time by typing the following:
+
+```python
+>>> help(get_word_forms)
+```
+
+## Why?
+In Natural Language Processing and Search, one often needs to treat words like "run" and "ran", "love" and "lovable"
+or "politician" and "politics" as the same word. This is usually done by algorithmically reducing each word into a
+base word and then comparing the base words. The process is called Stemming.
+For example, the [Porter Stemmer](http://text-processing.com/demo/stem/) reduces both "love" and "lovely"
+into the base word "love".
+
+Stemmers have several shortcomings. Firstly, the base word produced by the Stemmer is not always a valid English word.
+For example, the Porter Stemmer reduces the word "operation" to "oper". Secondly, the Stemmers have a high false negative rate.
+For example, "run" is reduced to "run" and "ran" is reduced to "ran". This happens because the Stemmers use a set of
+rational rules for finding the base words, and as we all know, the English language does not always behave very rationally.
+
+Lemmatizers are more accurate than Stemmers because they produce a base form that is present in the dictionary (also called the Lemma). So the reduced word is always a valid English word. However, Lemmatizers also have false negatives because they are not very good at connecting words across different parts of speeches. The [WordNet Lemmatizer](http://textanalysisonline.com/nltk-wordnet-lemmatizer) included with NLTK fails at almost all such examples. "operations" is reduced to "operation" and "operate" is reduced to "operate".
+
+Word Forms tries to solve this problem by finding all possible forms of a given English word. It can perform verb conjugations, connect noun forms to verb forms, adjective forms, adverb forms, plularize singular forms etc.
+
+## Bonus: A simple lemmatizer
+
+We also offer a very simple lemmatizer based on ``word_forms``. Here is how to use it.
+
+```python
+>>> from word_forms.lemmatizer import lemmatize
+>>> lemmatize("operations")
+'operant'
+>>> lemmatize("operate")
+'operant'
+```
+
+Enjoy!
+
+## Compatibility
+
+Tested on Python 3
+
+## Installation
+
+Using `pip`:
+
+```
+pip install -U word_forms
+```
+
+### From source
+Or you can install it from source:
+
+1. Clone the repository:
+
+```
+git clone https://github.com/gutfeeling/word_forms.git
+```
+
+2. Install it using `pip` or `setup.py`
+
+```
+pip install -e word_forms
+% or
+cd word_forms
+python setup.py install
+```
+
+## Acknowledgement
+
+1. [The XTAG project](http://www.cis.upenn.edu/~xtag/) for information on [verb conjugations](word_forms/en-verbs.txt).
+2. [WordNet](http://wordnet.princeton.edu/)
+
+## Maintainer
+
+Hi, I am Dibya and I maintain this repository. I would love to hear from you. Feel free to get in touch with me
+at dibyachakravorty@gmail.com.
+
+## Contributors
+
+- Tom Aarsen @CubieDev is a major contributor and is singlehandedly responsible for v2.0.0.
+- Sajal Sharma @sajal2692 ia a major contributor.
+
+## Contributions
+
+Word Forms is not perfect. In particular, a couple of aspects can be improved.
+
+1. It sometimes generates non dictionary words like "runninesses" because the pluralization/singularization algorithm is
+not perfect. At the moment, I am using [inflect](https://pypi.python.org/pypi/inflect) for it.
+
+If you like this package, feel free to contribute. Your pull requests are most welcome.
+
+
+
+
+%package help
+Summary: Development documents and examples for word-forms
+Provides: python3-word-forms-doc
+%description help
+<img src="https://github.com/gutfeeling/word_forms/raw/master/logo.png" alt="word forms logo" width="500">
+
+## Accurately generate all possible forms of an English word
+
+Word forms can accurately generate all possible forms of an English word. It can conjugate verbs. It can connect different
+parts of speeches e.g noun to adjective, adjective to adverb, noun to verb etc. It can pluralize singular nouns. It does this all in one function. Enjoy!
+
+## Examples
+
+Some very timely examples :-P
+
+```python
+>>> from word_forms.word_forms import get_word_forms
+>>> get_word_forms("president")
+>>> {'n': {'presidents', 'presidentships', 'presidencies', 'presidentship', 'president', 'presidency'},
+ 'a': {'presidential'},
+ 'v': {'preside', 'presided', 'presiding', 'presides'},
+ 'r': {'presidentially'}}
+>>> get_word_forms("elect")
+>>> {'n': {'elects', 'electives', 'electors', 'elect', 'eligibilities', 'electorates', 'eligibility', 'elector', 'election', 'elections', 'electorate', 'elective'},
+ 'a': {'eligible', 'electoral', 'elective', 'elect'},
+ 'v': {'electing', 'elects', 'elected', 'elect'},
+ 'r': set()}
+>>> get_word_forms("politician")
+>>> {'n': {'politician', 'politics', 'politicians'},
+ 'a': {'political'},
+ 'v': set(),
+ 'r': {'politically'}}
+>>> get_word_forms("am")
+>>> {'n': {'being', 'beings'},
+ 'a': set(),
+ 'v': {'was', 'be', "weren't", 'am', "wasn't", "aren't", 'being', 'were', 'is', "isn't", 'been', 'are', 'am not'},
+ 'r': set()}
+>>> get_word_forms("ran")
+>>> {'n': {'run', 'runniness', 'runner', 'runninesses', 'running', 'runners', 'runnings', 'runs'},
+ 'a': {'running', 'runny'},
+ 'v': {'running', 'run', 'ran', 'runs'},
+ 'r': set()}
+>>> get_word_forms('continent', 0.8) # with configurable similarity threshold
+>>> {'n': {'continents', 'continency', 'continences', 'continent', 'continencies', 'continence'},
+ 'a': {'continental', 'continent'},
+ 'v': set(),
+ 'r': set()}
+```
+As you can see, the output is a dictionary with four keys. "r" stands for adverb, "a" for adjective, "n" for noun
+and "v" for verb. Don't ask me why "r" stands for adverb. This is what WordNet uses, so this is why I use it too :-)
+
+Help can be obtained at any time by typing the following:
+
+```python
+>>> help(get_word_forms)
+```
+
+## Why?
+In Natural Language Processing and Search, one often needs to treat words like "run" and "ran", "love" and "lovable"
+or "politician" and "politics" as the same word. This is usually done by algorithmically reducing each word into a
+base word and then comparing the base words. The process is called Stemming.
+For example, the [Porter Stemmer](http://text-processing.com/demo/stem/) reduces both "love" and "lovely"
+into the base word "love".
+
+Stemmers have several shortcomings. Firstly, the base word produced by the Stemmer is not always a valid English word.
+For example, the Porter Stemmer reduces the word "operation" to "oper". Secondly, the Stemmers have a high false negative rate.
+For example, "run" is reduced to "run" and "ran" is reduced to "ran". This happens because the Stemmers use a set of
+rational rules for finding the base words, and as we all know, the English language does not always behave very rationally.
+
+Lemmatizers are more accurate than Stemmers because they produce a base form that is present in the dictionary (also called the Lemma). So the reduced word is always a valid English word. However, Lemmatizers also have false negatives because they are not very good at connecting words across different parts of speeches. The [WordNet Lemmatizer](http://textanalysisonline.com/nltk-wordnet-lemmatizer) included with NLTK fails at almost all such examples. "operations" is reduced to "operation" and "operate" is reduced to "operate".
+
+Word Forms tries to solve this problem by finding all possible forms of a given English word. It can perform verb conjugations, connect noun forms to verb forms, adjective forms, adverb forms, plularize singular forms etc.
+
+## Bonus: A simple lemmatizer
+
+We also offer a very simple lemmatizer based on ``word_forms``. Here is how to use it.
+
+```python
+>>> from word_forms.lemmatizer import lemmatize
+>>> lemmatize("operations")
+'operant'
+>>> lemmatize("operate")
+'operant'
+```
+
+Enjoy!
+
+## Compatibility
+
+Tested on Python 3
+
+## Installation
+
+Using `pip`:
+
+```
+pip install -U word_forms
+```
+
+### From source
+Or you can install it from source:
+
+1. Clone the repository:
+
+```
+git clone https://github.com/gutfeeling/word_forms.git
+```
+
+2. Install it using `pip` or `setup.py`
+
+```
+pip install -e word_forms
+% or
+cd word_forms
+python setup.py install
+```
+
+## Acknowledgement
+
+1. [The XTAG project](http://www.cis.upenn.edu/~xtag/) for information on [verb conjugations](word_forms/en-verbs.txt).
+2. [WordNet](http://wordnet.princeton.edu/)
+
+## Maintainer
+
+Hi, I am Dibya and I maintain this repository. I would love to hear from you. Feel free to get in touch with me
+at dibyachakravorty@gmail.com.
+
+## Contributors
+
+- Tom Aarsen @CubieDev is a major contributor and is singlehandedly responsible for v2.0.0.
+- Sajal Sharma @sajal2692 ia a major contributor.
+
+## Contributions
+
+Word Forms is not perfect. In particular, a couple of aspects can be improved.
+
+1. It sometimes generates non dictionary words like "runninesses" because the pluralization/singularization algorithm is
+not perfect. At the moment, I am using [inflect](https://pypi.python.org/pypi/inflect) for it.
+
+If you like this package, feel free to contribute. Your pull requests are most welcome.
+
+
+
+
+%prep
+%autosetup -n word-forms-2.1.0
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-word-forms -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Wed May 10 2023 Python_Bot <Python_Bot@openeuler.org> - 2.1.0-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..075f916
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+6801c9a327ebdbdda03463254b1e2c23 word_forms-2.1.0.tar.gz