summaryrefslogtreecommitdiff
path: root/python-indic-nlp-library.spec
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-05-31 03:26:48 +0000
committerCoprDistGit <infra@openeuler.org>2023-05-31 03:26:48 +0000
commit9187efdd2defca42258ee99ad6c76f2abaab9c2d (patch)
tree8273ef2fdb88a4be1b7ba71c02111be0e113840f /python-indic-nlp-library.spec
parentfe4aa9723f0416adda52887273b472b5c710ea93 (diff)
automatic import of python-indic-nlp-library
Diffstat (limited to 'python-indic-nlp-library.spec')
-rw-r--r--python-indic-nlp-library.spec503
1 files changed, 503 insertions, 0 deletions
diff --git a/python-indic-nlp-library.spec b/python-indic-nlp-library.spec
new file mode 100644
index 0000000..9656637
--- /dev/null
+++ b/python-indic-nlp-library.spec
@@ -0,0 +1,503 @@
+%global _empty_manifest_terminate_build 0
+Name: python-indic-nlp-library
+Version: 0.92
+Release: 1
+Summary: The goal of the Indic NLP Library is to build Python based libraries for common text processing and Natural Language Processing in Indian languages.
+License: MIT
+URL: https://github.com/anoopkunchukuttan/indic_nlp_library
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/80/f6/bcd6b5c49351d5261ee3873f78d66fd7aa11d410f5c12e4cc1a4ee85c8f9/indic_nlp_library-0.92.tar.gz
+BuildArch: noarch
+
+Requires: python3-sphinx-argparse
+Requires: python3-sphinx-rtd-theme
+Requires: python3-morfessor
+Requires: python3-pandas
+Requires: python3-numpy
+
+%description
+# Indic NLP Library
+
+The goal of the Indic NLP Library is to build Python based libraries for common text processing and Natural Language Processing in Indian languages. Indian languages share a lot of similarity in terms of script, phonology, language syntax, etc. and this library is an attempt to provide a general solution to very commonly required toolsets for Indian language text.
+
+The library provides the following functionalities:
+
+- Text Normalization
+- Script Information
+- Word Tokenization and Detokenization
+- Sentence Splitting
+- Word Segmentation
+- Syllabification
+- Script Conversion
+- Romanization
+- Indicization
+- Transliteration
+- Translation
+
+The data resources required by the Indic NLP Library are hosted in a different repository. These resources are required for some modules. You can download from the [Indic NLP Resources](https://github.com/anoopkunchukuttan/indic_nlp_resources) project.
+
+**If you are interested in Indian language NLP resources, you should check the [Indic NLP Catalog](https://github.com/indicnlpweb/indicnlp_catalog) for pointers.**
+
+## Pre-requisites
+
+- Python 3.x
+ - (For Python 2.x version check the tag `PYTHON_2.7_FINAL_JAN_2019`. Not actively supporting Python 2.x anymore, but will try to maintain as much compatibility as possible)
+- [Indic NLP Resources](https://github.com/anoopkunchukuttan/indic_nlp_resources)
+- [Urduhack](https://github.com/urduhack/urduhack): Needed only if Urdu normalization is required. It has other dependencies like Tensorflow.
+- Other dependencies are listed in setup.py
+
+
+## Configuration
+
+- Installation from pip:
+
+ `pip install indic-nlp-library`
+
+- If you want to use the project from the github repo, add the project to the Python Path:
+
+ - Clone this repository
+ - Install dependencies: `pip install -r requirements.txt`
+ - Run: `export PYTHONPATH=$PYTHONPATH:<project base directory>`
+
+- In either case, export the path to the _Indic NLP Resources_ directory
+
+ Run: `export INDIC_RESOURCES_PATH=<path to Indic NLP resources>`
+
+## Usage
+
+You can use the Python API to access all the features of the library. Many of the most common operations are also accessible via a unified commandline API.
+
+### Getting Started
+
+Check [this IPython Notebook](http://nbviewer.ipython.org/url/anoopkunchukuttan.github.io/indic_nlp_library/doc/indic_nlp_examples.ipynb) for examples to use the Python API.
+ - You can find the Python 2.x Notebook [here](http://nbviewer.ipython.org/url/anoopkunchukuttan.github.io/indic_nlp_library/doc/indic_nlp_examples_2_7.ipynb)
+
+### Documentation
+
+You can find detailed documentation [HERE](https://indic-nlp-library.readthedocs.io/en/latest)
+
+This documents the Python API as well as the commandline reference.
+
+## Citing
+
+If you use this library, please include the following citation:
+
+```
+@misc{kunchukuttan2020indicnlp,
+author = "Anoop Kunchukuttan",
+title = "{The IndicNLP Library}",
+year = "2020",
+howpublished={\url{https://github.com/anoopkunchukuttan/indic_nlp_library/blob/master/docs/indicnlp.pdf}}
+}
+```
+You can find the document [HERE](docs/indicnlp.pdf)
+
+## Website
+
+`http://anoopkunchukuttan.github.io/indic_nlp_library`
+
+## Author
+Anoop Kunchukuttan ([anoop.kunchukuttan@gmail.com](anoop.kunchukuttan@gmail.com))
+
+## Companies, Organizations, Projects using IndicNLP Library
+
+- [AI4Bharat-IndicNLPSuite](https://indicnlp.ai4bharat.org)
+- [The Classical Language Toolkit](http://cltk.org)
+- [Microsoft NLP Recipes](https://github.com/microsoft/nlp-recipes)
+- [Facebook M2M-100](https://github.com/pytorch/fairseq/tree/master/examples/m2m_100)
+
+## Revision Log
+
+
+0.81 : 26 May 2021
+
+ - Bug fix in version number extraction
+
+0.80 : 24 May 2021
+
+ - Improved sentence splitting
+ - Bug fixes
+ - Support for Urdu Normalizer
+
+0.71 : 03 Sep 2020
+
+ - Improved documentation
+ - Bug fixes
+
+0.7 : 02 Apr 2020:
+
+ - Unified commandline
+ - Improved documentation
+ - Added setup.py
+
+0.6 : 16 Dec 2019:
+
+ - New romanizer and indicizer
+ - Script Unifiers
+ - Improved script normalizers
+ - Added contrib directory for sample uses
+ - changed to MIT license
+
+0.5 : 03 Jun 2019:
+
+ - Improved word tokenizer to handle dates and numbers.
+ - Added sentence splitter that can handle common prefixes/honorofics and uses some heuristics.
+ - Added detokenizer
+ - Added acronym transliterator that can convert English acronyms to Brahmi-derived scripts
+
+0.4 : 28 Jan 2019: Ported to Python 3, and lots of feature additions since last release; primarily around script information, script similarity and syllabification.
+
+0.3 : 21 Oct 2014: Supports morph-analysis between Indian languages
+
+0.2 : 13 Jun 2014: Supports transliteration between Indian languages and tokenization of Indian languages
+
+0.1 : 12 Mar 2014: Initial version. Supports text normalization.
+
+## LICENSE
+
+Indic NLP Library is released under the MIT license
+
+
+
+
+%package -n python3-indic-nlp-library
+Summary: The goal of the Indic NLP Library is to build Python based libraries for common text processing and Natural Language Processing in Indian languages.
+Provides: python-indic-nlp-library
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-indic-nlp-library
+# Indic NLP Library
+
+The goal of the Indic NLP Library is to build Python based libraries for common text processing and Natural Language Processing in Indian languages. Indian languages share a lot of similarity in terms of script, phonology, language syntax, etc. and this library is an attempt to provide a general solution to very commonly required toolsets for Indian language text.
+
+The library provides the following functionalities:
+
+- Text Normalization
+- Script Information
+- Word Tokenization and Detokenization
+- Sentence Splitting
+- Word Segmentation
+- Syllabification
+- Script Conversion
+- Romanization
+- Indicization
+- Transliteration
+- Translation
+
+The data resources required by the Indic NLP Library are hosted in a different repository. These resources are required for some modules. You can download from the [Indic NLP Resources](https://github.com/anoopkunchukuttan/indic_nlp_resources) project.
+
+**If you are interested in Indian language NLP resources, you should check the [Indic NLP Catalog](https://github.com/indicnlpweb/indicnlp_catalog) for pointers.**
+
+## Pre-requisites
+
+- Python 3.x
+ - (For Python 2.x version check the tag `PYTHON_2.7_FINAL_JAN_2019`. Not actively supporting Python 2.x anymore, but will try to maintain as much compatibility as possible)
+- [Indic NLP Resources](https://github.com/anoopkunchukuttan/indic_nlp_resources)
+- [Urduhack](https://github.com/urduhack/urduhack): Needed only if Urdu normalization is required. It has other dependencies like Tensorflow.
+- Other dependencies are listed in setup.py
+
+
+## Configuration
+
+- Installation from pip:
+
+ `pip install indic-nlp-library`
+
+- If you want to use the project from the github repo, add the project to the Python Path:
+
+ - Clone this repository
+ - Install dependencies: `pip install -r requirements.txt`
+ - Run: `export PYTHONPATH=$PYTHONPATH:<project base directory>`
+
+- In either case, export the path to the _Indic NLP Resources_ directory
+
+ Run: `export INDIC_RESOURCES_PATH=<path to Indic NLP resources>`
+
+## Usage
+
+You can use the Python API to access all the features of the library. Many of the most common operations are also accessible via a unified commandline API.
+
+### Getting Started
+
+Check [this IPython Notebook](http://nbviewer.ipython.org/url/anoopkunchukuttan.github.io/indic_nlp_library/doc/indic_nlp_examples.ipynb) for examples to use the Python API.
+ - You can find the Python 2.x Notebook [here](http://nbviewer.ipython.org/url/anoopkunchukuttan.github.io/indic_nlp_library/doc/indic_nlp_examples_2_7.ipynb)
+
+### Documentation
+
+You can find detailed documentation [HERE](https://indic-nlp-library.readthedocs.io/en/latest)
+
+This documents the Python API as well as the commandline reference.
+
+## Citing
+
+If you use this library, please include the following citation:
+
+```
+@misc{kunchukuttan2020indicnlp,
+author = "Anoop Kunchukuttan",
+title = "{The IndicNLP Library}",
+year = "2020",
+howpublished={\url{https://github.com/anoopkunchukuttan/indic_nlp_library/blob/master/docs/indicnlp.pdf}}
+}
+```
+You can find the document [HERE](docs/indicnlp.pdf)
+
+## Website
+
+`http://anoopkunchukuttan.github.io/indic_nlp_library`
+
+## Author
+Anoop Kunchukuttan ([anoop.kunchukuttan@gmail.com](anoop.kunchukuttan@gmail.com))
+
+## Companies, Organizations, Projects using IndicNLP Library
+
+- [AI4Bharat-IndicNLPSuite](https://indicnlp.ai4bharat.org)
+- [The Classical Language Toolkit](http://cltk.org)
+- [Microsoft NLP Recipes](https://github.com/microsoft/nlp-recipes)
+- [Facebook M2M-100](https://github.com/pytorch/fairseq/tree/master/examples/m2m_100)
+
+## Revision Log
+
+
+0.81 : 26 May 2021
+
+ - Bug fix in version number extraction
+
+0.80 : 24 May 2021
+
+ - Improved sentence splitting
+ - Bug fixes
+ - Support for Urdu Normalizer
+
+0.71 : 03 Sep 2020
+
+ - Improved documentation
+ - Bug fixes
+
+0.7 : 02 Apr 2020:
+
+ - Unified commandline
+ - Improved documentation
+ - Added setup.py
+
+0.6 : 16 Dec 2019:
+
+ - New romanizer and indicizer
+ - Script Unifiers
+ - Improved script normalizers
+ - Added contrib directory for sample uses
+ - changed to MIT license
+
+0.5 : 03 Jun 2019:
+
+ - Improved word tokenizer to handle dates and numbers.
+ - Added sentence splitter that can handle common prefixes/honorofics and uses some heuristics.
+ - Added detokenizer
+ - Added acronym transliterator that can convert English acronyms to Brahmi-derived scripts
+
+0.4 : 28 Jan 2019: Ported to Python 3, and lots of feature additions since last release; primarily around script information, script similarity and syllabification.
+
+0.3 : 21 Oct 2014: Supports morph-analysis between Indian languages
+
+0.2 : 13 Jun 2014: Supports transliteration between Indian languages and tokenization of Indian languages
+
+0.1 : 12 Mar 2014: Initial version. Supports text normalization.
+
+## LICENSE
+
+Indic NLP Library is released under the MIT license
+
+
+
+
+%package help
+Summary: Development documents and examples for indic-nlp-library
+Provides: python3-indic-nlp-library-doc
+%description help
+# Indic NLP Library
+
+The goal of the Indic NLP Library is to build Python based libraries for common text processing and Natural Language Processing in Indian languages. Indian languages share a lot of similarity in terms of script, phonology, language syntax, etc. and this library is an attempt to provide a general solution to very commonly required toolsets for Indian language text.
+
+The library provides the following functionalities:
+
+- Text Normalization
+- Script Information
+- Word Tokenization and Detokenization
+- Sentence Splitting
+- Word Segmentation
+- Syllabification
+- Script Conversion
+- Romanization
+- Indicization
+- Transliteration
+- Translation
+
+The data resources required by the Indic NLP Library are hosted in a different repository. These resources are required for some modules. You can download from the [Indic NLP Resources](https://github.com/anoopkunchukuttan/indic_nlp_resources) project.
+
+**If you are interested in Indian language NLP resources, you should check the [Indic NLP Catalog](https://github.com/indicnlpweb/indicnlp_catalog) for pointers.**
+
+## Pre-requisites
+
+- Python 3.x
+ - (For Python 2.x version check the tag `PYTHON_2.7_FINAL_JAN_2019`. Not actively supporting Python 2.x anymore, but will try to maintain as much compatibility as possible)
+- [Indic NLP Resources](https://github.com/anoopkunchukuttan/indic_nlp_resources)
+- [Urduhack](https://github.com/urduhack/urduhack): Needed only if Urdu normalization is required. It has other dependencies like Tensorflow.
+- Other dependencies are listed in setup.py
+
+
+## Configuration
+
+- Installation from pip:
+
+ `pip install indic-nlp-library`
+
+- If you want to use the project from the github repo, add the project to the Python Path:
+
+ - Clone this repository
+ - Install dependencies: `pip install -r requirements.txt`
+ - Run: `export PYTHONPATH=$PYTHONPATH:<project base directory>`
+
+- In either case, export the path to the _Indic NLP Resources_ directory
+
+ Run: `export INDIC_RESOURCES_PATH=<path to Indic NLP resources>`
+
+## Usage
+
+You can use the Python API to access all the features of the library. Many of the most common operations are also accessible via a unified commandline API.
+
+### Getting Started
+
+Check [this IPython Notebook](http://nbviewer.ipython.org/url/anoopkunchukuttan.github.io/indic_nlp_library/doc/indic_nlp_examples.ipynb) for examples to use the Python API.
+ - You can find the Python 2.x Notebook [here](http://nbviewer.ipython.org/url/anoopkunchukuttan.github.io/indic_nlp_library/doc/indic_nlp_examples_2_7.ipynb)
+
+### Documentation
+
+You can find detailed documentation [HERE](https://indic-nlp-library.readthedocs.io/en/latest)
+
+This documents the Python API as well as the commandline reference.
+
+## Citing
+
+If you use this library, please include the following citation:
+
+```
+@misc{kunchukuttan2020indicnlp,
+author = "Anoop Kunchukuttan",
+title = "{The IndicNLP Library}",
+year = "2020",
+howpublished={\url{https://github.com/anoopkunchukuttan/indic_nlp_library/blob/master/docs/indicnlp.pdf}}
+}
+```
+You can find the document [HERE](docs/indicnlp.pdf)
+
+## Website
+
+`http://anoopkunchukuttan.github.io/indic_nlp_library`
+
+## Author
+Anoop Kunchukuttan ([anoop.kunchukuttan@gmail.com](anoop.kunchukuttan@gmail.com))
+
+## Companies, Organizations, Projects using IndicNLP Library
+
+- [AI4Bharat-IndicNLPSuite](https://indicnlp.ai4bharat.org)
+- [The Classical Language Toolkit](http://cltk.org)
+- [Microsoft NLP Recipes](https://github.com/microsoft/nlp-recipes)
+- [Facebook M2M-100](https://github.com/pytorch/fairseq/tree/master/examples/m2m_100)
+
+## Revision Log
+
+
+0.81 : 26 May 2021
+
+ - Bug fix in version number extraction
+
+0.80 : 24 May 2021
+
+ - Improved sentence splitting
+ - Bug fixes
+ - Support for Urdu Normalizer
+
+0.71 : 03 Sep 2020
+
+ - Improved documentation
+ - Bug fixes
+
+0.7 : 02 Apr 2020:
+
+ - Unified commandline
+ - Improved documentation
+ - Added setup.py
+
+0.6 : 16 Dec 2019:
+
+ - New romanizer and indicizer
+ - Script Unifiers
+ - Improved script normalizers
+ - Added contrib directory for sample uses
+ - changed to MIT license
+
+0.5 : 03 Jun 2019:
+
+ - Improved word tokenizer to handle dates and numbers.
+ - Added sentence splitter that can handle common prefixes/honorofics and uses some heuristics.
+ - Added detokenizer
+ - Added acronym transliterator that can convert English acronyms to Brahmi-derived scripts
+
+0.4 : 28 Jan 2019: Ported to Python 3, and lots of feature additions since last release; primarily around script information, script similarity and syllabification.
+
+0.3 : 21 Oct 2014: Supports morph-analysis between Indian languages
+
+0.2 : 13 Jun 2014: Supports transliteration between Indian languages and tokenization of Indian languages
+
+0.1 : 12 Mar 2014: Initial version. Supports text normalization.
+
+## LICENSE
+
+Indic NLP Library is released under the MIT license
+
+
+
+
+%prep
+%autosetup -n indic-nlp-library-0.92
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-indic-nlp-library -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Wed May 31 2023 Python_Bot <Python_Bot@openeuler.org> - 0.92-1
+- Package Spec generated