summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-06-20 07:47:53 +0000
committerCoprDistGit <infra@openeuler.org>2023-06-20 07:47:53 +0000
commit9e07dbb2af6c2f6fa6fbc3c05630123ddf67a7dc (patch)
treee45982d82029c836bc13494086b2633549a16b39
parent08315bae8086b45ffe8ffffd45dbd57197ea83c9 (diff)
automatic import of python-whichlangopeneuler20.03
-rw-r--r--.gitignore1
-rw-r--r--python-whichlang.spec387
-rw-r--r--sources1
3 files changed, 389 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..f793e9b 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/whichlang-0.0.4.tar.gz
diff --git a/python-whichlang.spec b/python-whichlang.spec
new file mode 100644
index 0000000..8993acd
--- /dev/null
+++ b/python-whichlang.spec
@@ -0,0 +1,387 @@
+%global _empty_manifest_terminate_build 0
+Name: python-whichlang
+Version: 0.0.4
+Release: 1
+Summary: Does language identification for Indian languages
+License: MIT License
+URL: https://github.com/xtraspeed/whichlang
+Source0: https://mirrors.aliyun.com/pypi/web/packages/63/d5/dbd25ab5fdf4a0eaea0601158872129d53fd28e117ffd7a9b2b7f0782d84/whichlang-0.0.4.tar.gz
+BuildArch: noarch
+
+
+%description
+
+# whichlang
+
+
+
+whichlang is a Python library for identifying the language of the given text
+
+
+
+## Installation
+
+
+
+Use the package manager [pip](https://pip.pypa.io/en/stable/) to install whichlang.
+
+
+
+```bash
+
+pip install whichlang
+
+```
+
+
+
+## Usage
+
+
+
+```python
+
+from whichlang import whichlang as wl
+
+
+
+f = open('sample-test-files\\sample-hindi.txt','r')
+
+data = f.read()
+
+
+
+# returns tuple of top 3 probable languages, first one being most probable language
+
+print (wl.which_lang(data))
+
+>>> ('Hindi', 'Marathi', 'Punjabi') #Hindi is most probable.
+
+```
+
+
+
+```
+
+# For training a language model
+
+# assamese.txt is train data
+
+# Assamese is the language model created
+
+python train_lang_models.py -f train-data\as\assamese.txt -l Assamese
+
+```
+
+## Contributing
+
+Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
+
+
+
+## License
+
+[MIT](https://choosealicense.com/licenses/mit/)
+
+
+
+## Available Languages
+
+Hindi, Telugu, Tamil, Kannada, Malayalam, Punjabi, Marathi, Gujarati, Oriya, Assamese.
+
+
+
+## Acknowledgements
+
+1. We would like to thank the [Leipzig Corpora collection](https://corpora.uni-leipzig.de/en) where we collected data for training models.
+
+ Dirk Goldhahn, Thomas Eckart and Uwe Quasthoff (2012): Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), 2012
+
+2. whichlang is based on N-gram based Text categorization: Cavnar, William B., and John M. Trenkle. "N-gram-based text categorization." Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval. Vol. 161175. 1994.
+
+ The same approach was used in library [langdetect]((https://github.com/fedelopez77/langdetect)). We found this approach quite effective and wanted to explore for Indian languages. In whichlang, we train, optimize and make models readily available for Indian languages since these languages have been less explored.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+%package -n python3-whichlang
+Summary: Does language identification for Indian languages
+Provides: python-whichlang
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-whichlang
+
+# whichlang
+
+
+
+whichlang is a Python library for identifying the language of the given text
+
+
+
+## Installation
+
+
+
+Use the package manager [pip](https://pip.pypa.io/en/stable/) to install whichlang.
+
+
+
+```bash
+
+pip install whichlang
+
+```
+
+
+
+## Usage
+
+
+
+```python
+
+from whichlang import whichlang as wl
+
+
+
+f = open('sample-test-files\\sample-hindi.txt','r')
+
+data = f.read()
+
+
+
+# returns tuple of top 3 probable languages, first one being most probable language
+
+print (wl.which_lang(data))
+
+>>> ('Hindi', 'Marathi', 'Punjabi') #Hindi is most probable.
+
+```
+
+
+
+```
+
+# For training a language model
+
+# assamese.txt is train data
+
+# Assamese is the language model created
+
+python train_lang_models.py -f train-data\as\assamese.txt -l Assamese
+
+```
+
+## Contributing
+
+Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
+
+
+
+## License
+
+[MIT](https://choosealicense.com/licenses/mit/)
+
+
+
+## Available Languages
+
+Hindi, Telugu, Tamil, Kannada, Malayalam, Punjabi, Marathi, Gujarati, Oriya, Assamese.
+
+
+
+## Acknowledgements
+
+1. We would like to thank the [Leipzig Corpora collection](https://corpora.uni-leipzig.de/en) where we collected data for training models.
+
+ Dirk Goldhahn, Thomas Eckart and Uwe Quasthoff (2012): Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), 2012
+
+2. whichlang is based on N-gram based Text categorization: Cavnar, William B., and John M. Trenkle. "N-gram-based text categorization." Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval. Vol. 161175. 1994.
+
+ The same approach was used in library [langdetect]((https://github.com/fedelopez77/langdetect)). We found this approach quite effective and wanted to explore for Indian languages. In whichlang, we train, optimize and make models readily available for Indian languages since these languages have been less explored.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+%package help
+Summary: Development documents and examples for whichlang
+Provides: python3-whichlang-doc
+%description help
+
+# whichlang
+
+
+
+whichlang is a Python library for identifying the language of the given text
+
+
+
+## Installation
+
+
+
+Use the package manager [pip](https://pip.pypa.io/en/stable/) to install whichlang.
+
+
+
+```bash
+
+pip install whichlang
+
+```
+
+
+
+## Usage
+
+
+
+```python
+
+from whichlang import whichlang as wl
+
+
+
+f = open('sample-test-files\\sample-hindi.txt','r')
+
+data = f.read()
+
+
+
+# returns tuple of top 3 probable languages, first one being most probable language
+
+print (wl.which_lang(data))
+
+>>> ('Hindi', 'Marathi', 'Punjabi') #Hindi is most probable.
+
+```
+
+
+
+```
+
+# For training a language model
+
+# assamese.txt is train data
+
+# Assamese is the language model created
+
+python train_lang_models.py -f train-data\as\assamese.txt -l Assamese
+
+```
+
+## Contributing
+
+Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
+
+
+
+## License
+
+[MIT](https://choosealicense.com/licenses/mit/)
+
+
+
+## Available Languages
+
+Hindi, Telugu, Tamil, Kannada, Malayalam, Punjabi, Marathi, Gujarati, Oriya, Assamese.
+
+
+
+## Acknowledgements
+
+1. We would like to thank the [Leipzig Corpora collection](https://corpora.uni-leipzig.de/en) where we collected data for training models.
+
+ Dirk Goldhahn, Thomas Eckart and Uwe Quasthoff (2012): Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), 2012
+
+2. whichlang is based on N-gram based Text categorization: Cavnar, William B., and John M. Trenkle. "N-gram-based text categorization." Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval. Vol. 161175. 1994.
+
+ The same approach was used in library [langdetect]((https://github.com/fedelopez77/langdetect)). We found this approach quite effective and wanted to explore for Indian languages. In whichlang, we train, optimize and make models readily available for Indian languages since these languages have been less explored.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+%prep
+%autosetup -n whichlang-0.0.4
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-whichlang -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 0.0.4-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..9792b5f
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+0c55ed7d9cb3862752e44d2e121afca4 whichlang-0.0.4.tar.gz