%global _empty_manifest_terminate_build 0
Name:		python-whichlang
Version:	0.0.4
Release:	1
Summary:	Does language identification for Indian languages
License:	MIT License
URL:		https://github.com/xtraspeed/whichlang
Source0:	https://mirrors.aliyun.com/pypi/web/packages/63/d5/dbd25ab5fdf4a0eaea0601158872129d53fd28e117ffd7a9b2b7f0782d84/whichlang-0.0.4.tar.gz
BuildArch:	noarch


%description

# whichlang


whichlang is a Python library for identifying the language of the given text


## Installation


Use the package manager [pip](https://pip.pypa.io/en/stable/) to install whichlang.


```bash

pip install whichlang

```


## Usage


```python

from whichlang import whichlang as wl


f = open('sample-test-files\\sample-hindi.txt','r')

data = f.read()


# returns tuple of top 3 probable languages, first one being most probable language

print (wl.which_lang(data))

>>> ('Hindi', 'Marathi', 'Punjabi') #Hindi is most probable. 

```


```

# For training a language model

# assamese.txt is train data

# Assamese is the language model created

python train_lang_models.py -f train-data\as\assamese.txt -l Assamese

```

## Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.


## License

[MIT](https://choosealicense.com/licenses/mit/)


## Available Languages

Hindi, Telugu, Tamil, Kannada, Malayalam, Punjabi, Marathi, Gujarati, Oriya, Assamese.


## Acknowledgements

1. We would like to thank the [Leipzig Corpora collection](https://corpora.uni-leipzig.de/en) where we collected data for training models. 

    Dirk Goldhahn, Thomas Eckart and Uwe Quasthoff (2012): Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), 2012

2. whichlang is based on N-gram based Text categorization: Cavnar, William B., and John M. Trenkle. "N-gram-based text categorization." Proceedings of   SDAIR-94, 3rd annual symposium on document analysis and information retrieval. Vol. 161175. 1994.

 The same approach was used in library [langdetect]((https://github.com/fedelopez77/langdetect)). We found this approach quite effective and wanted to explore for Indian languages. In whichlang, we train, optimize and make  models readily available for Indian languages since these languages have been less explored.


%package -n python3-whichlang
Summary:	Does language identification for Indian languages
Provides:	python-whichlang
BuildRequires:	python3-devel
BuildRequires:	python3-setuptools
BuildRequires:	python3-pip
%description -n python3-whichlang

# whichlang


whichlang is a Python library for identifying the language of the given text


## Installation


Use the package manager [pip](https://pip.pypa.io/en/stable/) to install whichlang.


```bash

pip install whichlang

```


## Usage


```python

from whichlang import whichlang as wl


f = open('sample-test-files\\sample-hindi.txt','r')

data = f.read()


# returns tuple of top 3 probable languages, first one being most probable language

print (wl.which_lang(data))

>>> ('Hindi', 'Marathi', 'Punjabi') #Hindi is most probable. 

```


```

# For training a language model

# assamese.txt is train data

# Assamese is the language model created

python train_lang_models.py -f train-data\as\assamese.txt -l Assamese

```

## Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.


## License

[MIT](https://choosealicense.com/licenses/mit/)


## Available Languages

Hindi, Telugu, Tamil, Kannada, Malayalam, Punjabi, Marathi, Gujarati, Oriya, Assamese.


## Acknowledgements

1. We would like to thank the [Leipzig Corpora collection](https://corpora.uni-leipzig.de/en) where we collected data for training models. 

    Dirk Goldhahn, Thomas Eckart and Uwe Quasthoff (2012): Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), 2012

2. whichlang is based on N-gram based Text categorization: Cavnar, William B., and John M. Trenkle. "N-gram-based text categorization." Proceedings of   SDAIR-94, 3rd annual symposium on document analysis and information retrieval. Vol. 161175. 1994.

 The same approach was used in library [langdetect]((https://github.com/fedelopez77/langdetect)). We found this approach quite effective and wanted to explore for Indian languages. In whichlang, we train, optimize and make  models readily available for Indian languages since these languages have been less explored.


%package help
Summary:	Development documents and examples for whichlang
Provides:	python3-whichlang-doc
%description help

# whichlang


whichlang is a Python library for identifying the language of the given text


## Installation


Use the package manager [pip](https://pip.pypa.io/en/stable/) to install whichlang.


```bash

pip install whichlang

```


## Usage


```python

from whichlang import whichlang as wl


f = open('sample-test-files\\sample-hindi.txt','r')

data = f.read()


# returns tuple of top 3 probable languages, first one being most probable language

print (wl.which_lang(data))

>>> ('Hindi', 'Marathi', 'Punjabi') #Hindi is most probable. 

```


```

# For training a language model

# assamese.txt is train data

# Assamese is the language model created

python train_lang_models.py -f train-data\as\assamese.txt -l Assamese

```

## Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.


## License

[MIT](https://choosealicense.com/licenses/mit/)


## Available Languages

Hindi, Telugu, Tamil, Kannada, Malayalam, Punjabi, Marathi, Gujarati, Oriya, Assamese.


## Acknowledgements

1. We would like to thank the [Leipzig Corpora collection](https://corpora.uni-leipzig.de/en) where we collected data for training models. 

    Dirk Goldhahn, Thomas Eckart and Uwe Quasthoff (2012): Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), 2012

2. whichlang is based on N-gram based Text categorization: Cavnar, William B., and John M. Trenkle. "N-gram-based text categorization." Proceedings of   SDAIR-94, 3rd annual symposium on document analysis and information retrieval. Vol. 161175. 1994.

 The same approach was used in library [langdetect]((https://github.com/fedelopez77/langdetect)). We found this approach quite effective and wanted to explore for Indian languages. In whichlang, we train, optimize and make  models readily available for Indian languages since these languages have been less explored.


%prep
%autosetup -n whichlang-0.0.4

%build
%py3_build

%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
	find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
	find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
	find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
	find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
	find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .

%files -n python3-whichlang -f filelist.lst
%dir %{python3_sitelib}/*

%files help -f doclist.lst
%{_docdir}/*

%changelog
* Tue Jun 20 2023 Python_Bot <Python_Bot@openeuler.org> - 0.0.4-1
- Package Spec generated