%global _empty_manifest_terminate_build 0 Name: python-indic-nlp-library Version: 0.92 Release: 1 Summary: The goal of the Indic NLP Library is to build Python based libraries for common text processing and Natural Language Processing in Indian languages. License: MIT URL: https://github.com/anoopkunchukuttan/indic_nlp_library Source0: https://mirrors.aliyun.com/pypi/web/packages/80/f6/bcd6b5c49351d5261ee3873f78d66fd7aa11d410f5c12e4cc1a4ee85c8f9/indic_nlp_library-0.92.tar.gz BuildArch: noarch Requires: python3-sphinx-argparse Requires: python3-sphinx-rtd-theme Requires: python3-morfessor Requires: python3-pandas Requires: python3-numpy %description # Indic NLP Library The goal of the Indic NLP Library is to build Python based libraries for common text processing and Natural Language Processing in Indian languages. Indian languages share a lot of similarity in terms of script, phonology, language syntax, etc. and this library is an attempt to provide a general solution to very commonly required toolsets for Indian language text. The library provides the following functionalities: - Text Normalization - Script Information - Word Tokenization and Detokenization - Sentence Splitting - Word Segmentation - Syllabification - Script Conversion - Romanization - Indicization - Transliteration - Translation The data resources required by the Indic NLP Library are hosted in a different repository. These resources are required for some modules. You can download from the [Indic NLP Resources](https://github.com/anoopkunchukuttan/indic_nlp_resources) project. **If you are interested in Indian language NLP resources, you should check the [Indic NLP Catalog](https://github.com/indicnlpweb/indicnlp_catalog) for pointers.** ## Pre-requisites - Python 3.x - (For Python 2.x version check the tag `PYTHON_2.7_FINAL_JAN_2019`. Not actively supporting Python 2.x anymore, but will try to maintain as much compatibility as possible) - [Indic NLP Resources](https://github.com/anoopkunchukuttan/indic_nlp_resources) - [Urduhack](https://github.com/urduhack/urduhack): Needed only if Urdu normalization is required. It has other dependencies like Tensorflow. - Other dependencies are listed in setup.py ## Configuration - Installation from pip: `pip install indic-nlp-library` - If you want to use the project from the github repo, add the project to the Python Path: - Clone this repository - Install dependencies: `pip install -r requirements.txt` - Run: `export PYTHONPATH=$PYTHONPATH:` - In either case, export the path to the _Indic NLP Resources_ directory Run: `export INDIC_RESOURCES_PATH=` ## Usage You can use the Python API to access all the features of the library. Many of the most common operations are also accessible via a unified commandline API. ### Getting Started Check [this IPython Notebook](http://nbviewer.ipython.org/url/anoopkunchukuttan.github.io/indic_nlp_library/doc/indic_nlp_examples.ipynb) for examples to use the Python API. - You can find the Python 2.x Notebook [here](http://nbviewer.ipython.org/url/anoopkunchukuttan.github.io/indic_nlp_library/doc/indic_nlp_examples_2_7.ipynb) ### Documentation You can find detailed documentation [HERE](https://indic-nlp-library.readthedocs.io/en/latest) This documents the Python API as well as the commandline reference. ## Citing If you use this library, please include the following citation: ``` @misc{kunchukuttan2020indicnlp, author = "Anoop Kunchukuttan", title = "{The IndicNLP Library}", year = "2020", howpublished={\url{https://github.com/anoopkunchukuttan/indic_nlp_library/blob/master/docs/indicnlp.pdf}} } ``` You can find the document [HERE](docs/indicnlp.pdf) ## Website `http://anoopkunchukuttan.github.io/indic_nlp_library` ## Author Anoop Kunchukuttan ([anoop.kunchukuttan@gmail.com](anoop.kunchukuttan@gmail.com)) ## Companies, Organizations, Projects using IndicNLP Library - [AI4Bharat-IndicNLPSuite](https://indicnlp.ai4bharat.org) - [The Classical Language Toolkit](http://cltk.org) - [Microsoft NLP Recipes](https://github.com/microsoft/nlp-recipes) - [Facebook M2M-100](https://github.com/pytorch/fairseq/tree/master/examples/m2m_100) ## Revision Log 0.81 : 26 May 2021 - Bug fix in version number extraction 0.80 : 24 May 2021 - Improved sentence splitting - Bug fixes - Support for Urdu Normalizer 0.71 : 03 Sep 2020 - Improved documentation - Bug fixes 0.7 : 02 Apr 2020: - Unified commandline - Improved documentation - Added setup.py 0.6 : 16 Dec 2019: - New romanizer and indicizer - Script Unifiers - Improved script normalizers - Added contrib directory for sample uses - changed to MIT license 0.5 : 03 Jun 2019: - Improved word tokenizer to handle dates and numbers. - Added sentence splitter that can handle common prefixes/honorofics and uses some heuristics. - Added detokenizer - Added acronym transliterator that can convert English acronyms to Brahmi-derived scripts 0.4 : 28 Jan 2019: Ported to Python 3, and lots of feature additions since last release; primarily around script information, script similarity and syllabification. 0.3 : 21 Oct 2014: Supports morph-analysis between Indian languages 0.2 : 13 Jun 2014: Supports transliteration between Indian languages and tokenization of Indian languages 0.1 : 12 Mar 2014: Initial version. Supports text normalization. ## LICENSE Indic NLP Library is released under the MIT license %package -n python3-indic-nlp-library Summary: The goal of the Indic NLP Library is to build Python based libraries for common text processing and Natural Language Processing in Indian languages. Provides: python-indic-nlp-library BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-indic-nlp-library # Indic NLP Library The goal of the Indic NLP Library is to build Python based libraries for common text processing and Natural Language Processing in Indian languages. Indian languages share a lot of similarity in terms of script, phonology, language syntax, etc. and this library is an attempt to provide a general solution to very commonly required toolsets for Indian language text. The library provides the following functionalities: - Text Normalization - Script Information - Word Tokenization and Detokenization - Sentence Splitting - Word Segmentation - Syllabification - Script Conversion - Romanization - Indicization - Transliteration - Translation The data resources required by the Indic NLP Library are hosted in a different repository. These resources are required for some modules. You can download from the [Indic NLP Resources](https://github.com/anoopkunchukuttan/indic_nlp_resources) project. **If you are interested in Indian language NLP resources, you should check the [Indic NLP Catalog](https://github.com/indicnlpweb/indicnlp_catalog) for pointers.** ## Pre-requisites - Python 3.x - (For Python 2.x version check the tag `PYTHON_2.7_FINAL_JAN_2019`. Not actively supporting Python 2.x anymore, but will try to maintain as much compatibility as possible) - [Indic NLP Resources](https://github.com/anoopkunchukuttan/indic_nlp_resources) - [Urduhack](https://github.com/urduhack/urduhack): Needed only if Urdu normalization is required. It has other dependencies like Tensorflow. - Other dependencies are listed in setup.py ## Configuration - Installation from pip: `pip install indic-nlp-library` - If you want to use the project from the github repo, add the project to the Python Path: - Clone this repository - Install dependencies: `pip install -r requirements.txt` - Run: `export PYTHONPATH=$PYTHONPATH:` - In either case, export the path to the _Indic NLP Resources_ directory Run: `export INDIC_RESOURCES_PATH=` ## Usage You can use the Python API to access all the features of the library. Many of the most common operations are also accessible via a unified commandline API. ### Getting Started Check [this IPython Notebook](http://nbviewer.ipython.org/url/anoopkunchukuttan.github.io/indic_nlp_library/doc/indic_nlp_examples.ipynb) for examples to use the Python API. - You can find the Python 2.x Notebook [here](http://nbviewer.ipython.org/url/anoopkunchukuttan.github.io/indic_nlp_library/doc/indic_nlp_examples_2_7.ipynb) ### Documentation You can find detailed documentation [HERE](https://indic-nlp-library.readthedocs.io/en/latest) This documents the Python API as well as the commandline reference. ## Citing If you use this library, please include the following citation: ``` @misc{kunchukuttan2020indicnlp, author = "Anoop Kunchukuttan", title = "{The IndicNLP Library}", year = "2020", howpublished={\url{https://github.com/anoopkunchukuttan/indic_nlp_library/blob/master/docs/indicnlp.pdf}} } ``` You can find the document [HERE](docs/indicnlp.pdf) ## Website `http://anoopkunchukuttan.github.io/indic_nlp_library` ## Author Anoop Kunchukuttan ([anoop.kunchukuttan@gmail.com](anoop.kunchukuttan@gmail.com)) ## Companies, Organizations, Projects using IndicNLP Library - [AI4Bharat-IndicNLPSuite](https://indicnlp.ai4bharat.org) - [The Classical Language Toolkit](http://cltk.org) - [Microsoft NLP Recipes](https://github.com/microsoft/nlp-recipes) - [Facebook M2M-100](https://github.com/pytorch/fairseq/tree/master/examples/m2m_100) ## Revision Log 0.81 : 26 May 2021 - Bug fix in version number extraction 0.80 : 24 May 2021 - Improved sentence splitting - Bug fixes - Support for Urdu Normalizer 0.71 : 03 Sep 2020 - Improved documentation - Bug fixes 0.7 : 02 Apr 2020: - Unified commandline - Improved documentation - Added setup.py 0.6 : 16 Dec 2019: - New romanizer and indicizer - Script Unifiers - Improved script normalizers - Added contrib directory for sample uses - changed to MIT license 0.5 : 03 Jun 2019: - Improved word tokenizer to handle dates and numbers. - Added sentence splitter that can handle common prefixes/honorofics and uses some heuristics. - Added detokenizer - Added acronym transliterator that can convert English acronyms to Brahmi-derived scripts 0.4 : 28 Jan 2019: Ported to Python 3, and lots of feature additions since last release; primarily around script information, script similarity and syllabification. 0.3 : 21 Oct 2014: Supports morph-analysis between Indian languages 0.2 : 13 Jun 2014: Supports transliteration between Indian languages and tokenization of Indian languages 0.1 : 12 Mar 2014: Initial version. Supports text normalization. ## LICENSE Indic NLP Library is released under the MIT license %package help Summary: Development documents and examples for indic-nlp-library Provides: python3-indic-nlp-library-doc %description help # Indic NLP Library The goal of the Indic NLP Library is to build Python based libraries for common text processing and Natural Language Processing in Indian languages. Indian languages share a lot of similarity in terms of script, phonology, language syntax, etc. and this library is an attempt to provide a general solution to very commonly required toolsets for Indian language text. The library provides the following functionalities: - Text Normalization - Script Information - Word Tokenization and Detokenization - Sentence Splitting - Word Segmentation - Syllabification - Script Conversion - Romanization - Indicization - Transliteration - Translation The data resources required by the Indic NLP Library are hosted in a different repository. These resources are required for some modules. You can download from the [Indic NLP Resources](https://github.com/anoopkunchukuttan/indic_nlp_resources) project. **If you are interested in Indian language NLP resources, you should check the [Indic NLP Catalog](https://github.com/indicnlpweb/indicnlp_catalog) for pointers.** ## Pre-requisites - Python 3.x - (For Python 2.x version check the tag `PYTHON_2.7_FINAL_JAN_2019`. Not actively supporting Python 2.x anymore, but will try to maintain as much compatibility as possible) - [Indic NLP Resources](https://github.com/anoopkunchukuttan/indic_nlp_resources) - [Urduhack](https://github.com/urduhack/urduhack): Needed only if Urdu normalization is required. It has other dependencies like Tensorflow. - Other dependencies are listed in setup.py ## Configuration - Installation from pip: `pip install indic-nlp-library` - If you want to use the project from the github repo, add the project to the Python Path: - Clone this repository - Install dependencies: `pip install -r requirements.txt` - Run: `export PYTHONPATH=$PYTHONPATH:` - In either case, export the path to the _Indic NLP Resources_ directory Run: `export INDIC_RESOURCES_PATH=` ## Usage You can use the Python API to access all the features of the library. Many of the most common operations are also accessible via a unified commandline API. ### Getting Started Check [this IPython Notebook](http://nbviewer.ipython.org/url/anoopkunchukuttan.github.io/indic_nlp_library/doc/indic_nlp_examples.ipynb) for examples to use the Python API. - You can find the Python 2.x Notebook [here](http://nbviewer.ipython.org/url/anoopkunchukuttan.github.io/indic_nlp_library/doc/indic_nlp_examples_2_7.ipynb) ### Documentation You can find detailed documentation [HERE](https://indic-nlp-library.readthedocs.io/en/latest) This documents the Python API as well as the commandline reference. ## Citing If you use this library, please include the following citation: ``` @misc{kunchukuttan2020indicnlp, author = "Anoop Kunchukuttan", title = "{The IndicNLP Library}", year = "2020", howpublished={\url{https://github.com/anoopkunchukuttan/indic_nlp_library/blob/master/docs/indicnlp.pdf}} } ``` You can find the document [HERE](docs/indicnlp.pdf) ## Website `http://anoopkunchukuttan.github.io/indic_nlp_library` ## Author Anoop Kunchukuttan ([anoop.kunchukuttan@gmail.com](anoop.kunchukuttan@gmail.com)) ## Companies, Organizations, Projects using IndicNLP Library - [AI4Bharat-IndicNLPSuite](https://indicnlp.ai4bharat.org) - [The Classical Language Toolkit](http://cltk.org) - [Microsoft NLP Recipes](https://github.com/microsoft/nlp-recipes) - [Facebook M2M-100](https://github.com/pytorch/fairseq/tree/master/examples/m2m_100) ## Revision Log 0.81 : 26 May 2021 - Bug fix in version number extraction 0.80 : 24 May 2021 - Improved sentence splitting - Bug fixes - Support for Urdu Normalizer 0.71 : 03 Sep 2020 - Improved documentation - Bug fixes 0.7 : 02 Apr 2020: - Unified commandline - Improved documentation - Added setup.py 0.6 : 16 Dec 2019: - New romanizer and indicizer - Script Unifiers - Improved script normalizers - Added contrib directory for sample uses - changed to MIT license 0.5 : 03 Jun 2019: - Improved word tokenizer to handle dates and numbers. - Added sentence splitter that can handle common prefixes/honorofics and uses some heuristics. - Added detokenizer - Added acronym transliterator that can convert English acronyms to Brahmi-derived scripts 0.4 : 28 Jan 2019: Ported to Python 3, and lots of feature additions since last release; primarily around script information, script similarity and syllabification. 0.3 : 21 Oct 2014: Supports morph-analysis between Indian languages 0.2 : 13 Jun 2014: Supports transliteration between Indian languages and tokenization of Indian languages 0.1 : 12 Mar 2014: Initial version. Supports text normalization. ## LICENSE Indic NLP Library is released under the MIT license %prep %autosetup -n indic_nlp_library-0.92 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-indic-nlp-library -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Fri Jun 09 2023 Python_Bot - 0.92-1 - Package Spec generated