diff options
| -rw-r--r-- | .gitignore | 1 | ||||
| -rw-r--r-- | python-word-forms.spec | 488 | ||||
| -rw-r--r-- | sources | 1 |
3 files changed, 490 insertions, 0 deletions
@@ -0,0 +1 @@ +/word_forms-2.1.0.tar.gz diff --git a/python-word-forms.spec b/python-word-forms.spec new file mode 100644 index 0000000..8d167e8 --- /dev/null +++ b/python-word-forms.spec @@ -0,0 +1,488 @@ +%global _empty_manifest_terminate_build 0 +Name: python-word-forms +Version: 2.1.0 +Release: 1 +Summary: Generate all possible forms of an English word. +License: MIT License +URL: https://github.com/gutfeeling/word_forms +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/31/39/e0f24b7c3f228561b346ae8c046817ff3d3929d77b0c3ca14a12e4d106b2/word_forms-2.1.0.tar.gz +BuildArch: noarch + +Requires: python3-inflect +Requires: python3-nltk + +%description +<img src="https://github.com/gutfeeling/word_forms/raw/master/logo.png" alt="word forms logo" width="500"> + +## Accurately generate all possible forms of an English word + +Word forms can accurately generate all possible forms of an English word. It can conjugate verbs. It can connect different +parts of speeches e.g noun to adjective, adjective to adverb, noun to verb etc. It can pluralize singular nouns. It does this all in one function. Enjoy! + +## Examples + +Some very timely examples :-P + +```python +>>> from word_forms.word_forms import get_word_forms +>>> get_word_forms("president") +>>> {'n': {'presidents', 'presidentships', 'presidencies', 'presidentship', 'president', 'presidency'}, + 'a': {'presidential'}, + 'v': {'preside', 'presided', 'presiding', 'presides'}, + 'r': {'presidentially'}} +>>> get_word_forms("elect") +>>> {'n': {'elects', 'electives', 'electors', 'elect', 'eligibilities', 'electorates', 'eligibility', 'elector', 'election', 'elections', 'electorate', 'elective'}, + 'a': {'eligible', 'electoral', 'elective', 'elect'}, + 'v': {'electing', 'elects', 'elected', 'elect'}, + 'r': set()} +>>> get_word_forms("politician") +>>> {'n': {'politician', 'politics', 'politicians'}, + 'a': {'political'}, + 'v': set(), + 'r': {'politically'}} +>>> get_word_forms("am") +>>> {'n': {'being', 'beings'}, + 'a': set(), + 'v': {'was', 'be', "weren't", 'am', "wasn't", "aren't", 'being', 'were', 'is', "isn't", 'been', 'are', 'am not'}, + 'r': set()} +>>> get_word_forms("ran") +>>> {'n': {'run', 'runniness', 'runner', 'runninesses', 'running', 'runners', 'runnings', 'runs'}, + 'a': {'running', 'runny'}, + 'v': {'running', 'run', 'ran', 'runs'}, + 'r': set()} +>>> get_word_forms('continent', 0.8) # with configurable similarity threshold +>>> {'n': {'continents', 'continency', 'continences', 'continent', 'continencies', 'continence'}, + 'a': {'continental', 'continent'}, + 'v': set(), + 'r': set()} +``` +As you can see, the output is a dictionary with four keys. "r" stands for adverb, "a" for adjective, "n" for noun +and "v" for verb. Don't ask me why "r" stands for adverb. This is what WordNet uses, so this is why I use it too :-) + +Help can be obtained at any time by typing the following: + +```python +>>> help(get_word_forms) +``` + +## Why? +In Natural Language Processing and Search, one often needs to treat words like "run" and "ran", "love" and "lovable" +or "politician" and "politics" as the same word. This is usually done by algorithmically reducing each word into a +base word and then comparing the base words. The process is called Stemming. +For example, the [Porter Stemmer](http://text-processing.com/demo/stem/) reduces both "love" and "lovely" +into the base word "love". + +Stemmers have several shortcomings. Firstly, the base word produced by the Stemmer is not always a valid English word. +For example, the Porter Stemmer reduces the word "operation" to "oper". Secondly, the Stemmers have a high false negative rate. +For example, "run" is reduced to "run" and "ran" is reduced to "ran". This happens because the Stemmers use a set of +rational rules for finding the base words, and as we all know, the English language does not always behave very rationally. + +Lemmatizers are more accurate than Stemmers because they produce a base form that is present in the dictionary (also called the Lemma). So the reduced word is always a valid English word. However, Lemmatizers also have false negatives because they are not very good at connecting words across different parts of speeches. The [WordNet Lemmatizer](http://textanalysisonline.com/nltk-wordnet-lemmatizer) included with NLTK fails at almost all such examples. "operations" is reduced to "operation" and "operate" is reduced to "operate". + +Word Forms tries to solve this problem by finding all possible forms of a given English word. It can perform verb conjugations, connect noun forms to verb forms, adjective forms, adverb forms, plularize singular forms etc. + +## Bonus: A simple lemmatizer + +We also offer a very simple lemmatizer based on ``word_forms``. Here is how to use it. + +```python +>>> from word_forms.lemmatizer import lemmatize +>>> lemmatize("operations") +'operant' +>>> lemmatize("operate") +'operant' +``` + +Enjoy! + +## Compatibility + +Tested on Python 3 + +## Installation + +Using `pip`: + +``` +pip install -U word_forms +``` + +### From source +Or you can install it from source: + +1. Clone the repository: + +``` +git clone https://github.com/gutfeeling/word_forms.git +``` + +2. Install it using `pip` or `setup.py` + +``` +pip install -e word_forms +% or +cd word_forms +python setup.py install +``` + +## Acknowledgement + +1. [The XTAG project](http://www.cis.upenn.edu/~xtag/) for information on [verb conjugations](word_forms/en-verbs.txt). +2. [WordNet](http://wordnet.princeton.edu/) + +## Maintainer + +Hi, I am Dibya and I maintain this repository. I would love to hear from you. Feel free to get in touch with me +at dibyachakravorty@gmail.com. + +## Contributors + +- Tom Aarsen @CubieDev is a major contributor and is singlehandedly responsible for v2.0.0. +- Sajal Sharma @sajal2692 ia a major contributor. + +## Contributions + +Word Forms is not perfect. In particular, a couple of aspects can be improved. + +1. It sometimes generates non dictionary words like "runninesses" because the pluralization/singularization algorithm is +not perfect. At the moment, I am using [inflect](https://pypi.python.org/pypi/inflect) for it. + +If you like this package, feel free to contribute. Your pull requests are most welcome. + + + + +%package -n python3-word-forms +Summary: Generate all possible forms of an English word. +Provides: python-word-forms +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-word-forms +<img src="https://github.com/gutfeeling/word_forms/raw/master/logo.png" alt="word forms logo" width="500"> + +## Accurately generate all possible forms of an English word + +Word forms can accurately generate all possible forms of an English word. It can conjugate verbs. It can connect different +parts of speeches e.g noun to adjective, adjective to adverb, noun to verb etc. It can pluralize singular nouns. It does this all in one function. Enjoy! + +## Examples + +Some very timely examples :-P + +```python +>>> from word_forms.word_forms import get_word_forms +>>> get_word_forms("president") +>>> {'n': {'presidents', 'presidentships', 'presidencies', 'presidentship', 'president', 'presidency'}, + 'a': {'presidential'}, + 'v': {'preside', 'presided', 'presiding', 'presides'}, + 'r': {'presidentially'}} +>>> get_word_forms("elect") +>>> {'n': {'elects', 'electives', 'electors', 'elect', 'eligibilities', 'electorates', 'eligibility', 'elector', 'election', 'elections', 'electorate', 'elective'}, + 'a': {'eligible', 'electoral', 'elective', 'elect'}, + 'v': {'electing', 'elects', 'elected', 'elect'}, + 'r': set()} +>>> get_word_forms("politician") +>>> {'n': {'politician', 'politics', 'politicians'}, + 'a': {'political'}, + 'v': set(), + 'r': {'politically'}} +>>> get_word_forms("am") +>>> {'n': {'being', 'beings'}, + 'a': set(), + 'v': {'was', 'be', "weren't", 'am', "wasn't", "aren't", 'being', 'were', 'is', "isn't", 'been', 'are', 'am not'}, + 'r': set()} +>>> get_word_forms("ran") +>>> {'n': {'run', 'runniness', 'runner', 'runninesses', 'running', 'runners', 'runnings', 'runs'}, + 'a': {'running', 'runny'}, + 'v': {'running', 'run', 'ran', 'runs'}, + 'r': set()} +>>> get_word_forms('continent', 0.8) # with configurable similarity threshold +>>> {'n': {'continents', 'continency', 'continences', 'continent', 'continencies', 'continence'}, + 'a': {'continental', 'continent'}, + 'v': set(), + 'r': set()} +``` +As you can see, the output is a dictionary with four keys. "r" stands for adverb, "a" for adjective, "n" for noun +and "v" for verb. Don't ask me why "r" stands for adverb. This is what WordNet uses, so this is why I use it too :-) + +Help can be obtained at any time by typing the following: + +```python +>>> help(get_word_forms) +``` + +## Why? +In Natural Language Processing and Search, one often needs to treat words like "run" and "ran", "love" and "lovable" +or "politician" and "politics" as the same word. This is usually done by algorithmically reducing each word into a +base word and then comparing the base words. The process is called Stemming. +For example, the [Porter Stemmer](http://text-processing.com/demo/stem/) reduces both "love" and "lovely" +into the base word "love". + +Stemmers have several shortcomings. Firstly, the base word produced by the Stemmer is not always a valid English word. +For example, the Porter Stemmer reduces the word "operation" to "oper". Secondly, the Stemmers have a high false negative rate. +For example, "run" is reduced to "run" and "ran" is reduced to "ran". This happens because the Stemmers use a set of +rational rules for finding the base words, and as we all know, the English language does not always behave very rationally. + +Lemmatizers are more accurate than Stemmers because they produce a base form that is present in the dictionary (also called the Lemma). So the reduced word is always a valid English word. However, Lemmatizers also have false negatives because they are not very good at connecting words across different parts of speeches. The [WordNet Lemmatizer](http://textanalysisonline.com/nltk-wordnet-lemmatizer) included with NLTK fails at almost all such examples. "operations" is reduced to "operation" and "operate" is reduced to "operate". + +Word Forms tries to solve this problem by finding all possible forms of a given English word. It can perform verb conjugations, connect noun forms to verb forms, adjective forms, adverb forms, plularize singular forms etc. + +## Bonus: A simple lemmatizer + +We also offer a very simple lemmatizer based on ``word_forms``. Here is how to use it. + +```python +>>> from word_forms.lemmatizer import lemmatize +>>> lemmatize("operations") +'operant' +>>> lemmatize("operate") +'operant' +``` + +Enjoy! + +## Compatibility + +Tested on Python 3 + +## Installation + +Using `pip`: + +``` +pip install -U word_forms +``` + +### From source +Or you can install it from source: + +1. Clone the repository: + +``` +git clone https://github.com/gutfeeling/word_forms.git +``` + +2. Install it using `pip` or `setup.py` + +``` +pip install -e word_forms +% or +cd word_forms +python setup.py install +``` + +## Acknowledgement + +1. [The XTAG project](http://www.cis.upenn.edu/~xtag/) for information on [verb conjugations](word_forms/en-verbs.txt). +2. [WordNet](http://wordnet.princeton.edu/) + +## Maintainer + +Hi, I am Dibya and I maintain this repository. I would love to hear from you. Feel free to get in touch with me +at dibyachakravorty@gmail.com. + +## Contributors + +- Tom Aarsen @CubieDev is a major contributor and is singlehandedly responsible for v2.0.0. +- Sajal Sharma @sajal2692 ia a major contributor. + +## Contributions + +Word Forms is not perfect. In particular, a couple of aspects can be improved. + +1. It sometimes generates non dictionary words like "runninesses" because the pluralization/singularization algorithm is +not perfect. At the moment, I am using [inflect](https://pypi.python.org/pypi/inflect) for it. + +If you like this package, feel free to contribute. Your pull requests are most welcome. + + + + +%package help +Summary: Development documents and examples for word-forms +Provides: python3-word-forms-doc +%description help +<img src="https://github.com/gutfeeling/word_forms/raw/master/logo.png" alt="word forms logo" width="500"> + +## Accurately generate all possible forms of an English word + +Word forms can accurately generate all possible forms of an English word. It can conjugate verbs. It can connect different +parts of speeches e.g noun to adjective, adjective to adverb, noun to verb etc. It can pluralize singular nouns. It does this all in one function. Enjoy! + +## Examples + +Some very timely examples :-P + +```python +>>> from word_forms.word_forms import get_word_forms +>>> get_word_forms("president") +>>> {'n': {'presidents', 'presidentships', 'presidencies', 'presidentship', 'president', 'presidency'}, + 'a': {'presidential'}, + 'v': {'preside', 'presided', 'presiding', 'presides'}, + 'r': {'presidentially'}} +>>> get_word_forms("elect") +>>> {'n': {'elects', 'electives', 'electors', 'elect', 'eligibilities', 'electorates', 'eligibility', 'elector', 'election', 'elections', 'electorate', 'elective'}, + 'a': {'eligible', 'electoral', 'elective', 'elect'}, + 'v': {'electing', 'elects', 'elected', 'elect'}, + 'r': set()} +>>> get_word_forms("politician") +>>> {'n': {'politician', 'politics', 'politicians'}, + 'a': {'political'}, + 'v': set(), + 'r': {'politically'}} +>>> get_word_forms("am") +>>> {'n': {'being', 'beings'}, + 'a': set(), + 'v': {'was', 'be', "weren't", 'am', "wasn't", "aren't", 'being', 'were', 'is', "isn't", 'been', 'are', 'am not'}, + 'r': set()} +>>> get_word_forms("ran") +>>> {'n': {'run', 'runniness', 'runner', 'runninesses', 'running', 'runners', 'runnings', 'runs'}, + 'a': {'running', 'runny'}, + 'v': {'running', 'run', 'ran', 'runs'}, + 'r': set()} +>>> get_word_forms('continent', 0.8) # with configurable similarity threshold +>>> {'n': {'continents', 'continency', 'continences', 'continent', 'continencies', 'continence'}, + 'a': {'continental', 'continent'}, + 'v': set(), + 'r': set()} +``` +As you can see, the output is a dictionary with four keys. "r" stands for adverb, "a" for adjective, "n" for noun +and "v" for verb. Don't ask me why "r" stands for adverb. This is what WordNet uses, so this is why I use it too :-) + +Help can be obtained at any time by typing the following: + +```python +>>> help(get_word_forms) +``` + +## Why? +In Natural Language Processing and Search, one often needs to treat words like "run" and "ran", "love" and "lovable" +or "politician" and "politics" as the same word. This is usually done by algorithmically reducing each word into a +base word and then comparing the base words. The process is called Stemming. +For example, the [Porter Stemmer](http://text-processing.com/demo/stem/) reduces both "love" and "lovely" +into the base word "love". + +Stemmers have several shortcomings. Firstly, the base word produced by the Stemmer is not always a valid English word. +For example, the Porter Stemmer reduces the word "operation" to "oper". Secondly, the Stemmers have a high false negative rate. +For example, "run" is reduced to "run" and "ran" is reduced to "ran". This happens because the Stemmers use a set of +rational rules for finding the base words, and as we all know, the English language does not always behave very rationally. + +Lemmatizers are more accurate than Stemmers because they produce a base form that is present in the dictionary (also called the Lemma). So the reduced word is always a valid English word. However, Lemmatizers also have false negatives because they are not very good at connecting words across different parts of speeches. The [WordNet Lemmatizer](http://textanalysisonline.com/nltk-wordnet-lemmatizer) included with NLTK fails at almost all such examples. "operations" is reduced to "operation" and "operate" is reduced to "operate". + +Word Forms tries to solve this problem by finding all possible forms of a given English word. It can perform verb conjugations, connect noun forms to verb forms, adjective forms, adverb forms, plularize singular forms etc. + +## Bonus: A simple lemmatizer + +We also offer a very simple lemmatizer based on ``word_forms``. Here is how to use it. + +```python +>>> from word_forms.lemmatizer import lemmatize +>>> lemmatize("operations") +'operant' +>>> lemmatize("operate") +'operant' +``` + +Enjoy! + +## Compatibility + +Tested on Python 3 + +## Installation + +Using `pip`: + +``` +pip install -U word_forms +``` + +### From source +Or you can install it from source: + +1. Clone the repository: + +``` +git clone https://github.com/gutfeeling/word_forms.git +``` + +2. Install it using `pip` or `setup.py` + +``` +pip install -e word_forms +% or +cd word_forms +python setup.py install +``` + +## Acknowledgement + +1. [The XTAG project](http://www.cis.upenn.edu/~xtag/) for information on [verb conjugations](word_forms/en-verbs.txt). +2. [WordNet](http://wordnet.princeton.edu/) + +## Maintainer + +Hi, I am Dibya and I maintain this repository. I would love to hear from you. Feel free to get in touch with me +at dibyachakravorty@gmail.com. + +## Contributors + +- Tom Aarsen @CubieDev is a major contributor and is singlehandedly responsible for v2.0.0. +- Sajal Sharma @sajal2692 ia a major contributor. + +## Contributions + +Word Forms is not perfect. In particular, a couple of aspects can be improved. + +1. It sometimes generates non dictionary words like "runninesses" because the pluralization/singularization algorithm is +not perfect. At the moment, I am using [inflect](https://pypi.python.org/pypi/inflect) for it. + +If you like this package, feel free to contribute. Your pull requests are most welcome. + + + + +%prep +%autosetup -n word-forms-2.1.0 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-word-forms -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Wed May 10 2023 Python_Bot <Python_Bot@openeuler.org> - 2.1.0-1 +- Package Spec generated @@ -0,0 +1 @@ +6801c9a327ebdbdda03463254b1e2c23 word_forms-2.1.0.tar.gz |
