diff options
Diffstat (limited to 'python-nlpo3.spec')
-rw-r--r-- | python-nlpo3.spec | 326 |
1 files changed, 326 insertions, 0 deletions
diff --git a/python-nlpo3.spec b/python-nlpo3.spec new file mode 100644 index 0000000..bb6dcae --- /dev/null +++ b/python-nlpo3.spec @@ -0,0 +1,326 @@ +%global _empty_manifest_terminate_build 0 +Name: python-nlpo3 +Version: 1.2.6 +Release: 1 +Summary: Python binding for nlpO3 Thai language processing library in Rust +License: Apache-2.0 +URL: https://github.com/PyThaiNLP/nlpo3/ +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/d6/3b/1514977f53c0e8e86b1fce6c9523cd7e248e729864505c7475bc13595700/nlpo3-1.2.6.tar.gz + + +%description +<a href="https://pypi.python.org/pypi/nlpo3"><img alt="pypi" src="https://img.shields.io/pypi/v/nlpo3.svg"/></a> +<a href="https://www.python.org/downloads/release/python-360/"><img alt="Python 3.6" src="https://img.shields.io/badge/python-3.6-blue.svg"/></a> +<a href="https://opensource.org/licenses/Apache-2.0"><img alt="License" src="https://img.shields.io/badge/License-Apache%202.0-blue.svg"/></a> +<a href="https://pepy.tech/project/nlpo3"><img alt="Downloads" src="https://pepy.tech/badge/nlpo3/month"/></a> + +# nlpO3 Python binding + +Python binding for nlpO3, a Thai natural language processing library in Rust. + +## Features + +- Thai word tokenizer + - `segment()` - use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries + - [2.5x faster](notebooks/nlpo3_segment_benchmarks.ipynb) than similar pure Python implementation (PyThaiNLP's newmm) + - `load_dict()` - load a dictionary from plain text file (one word per line) + + +## Dictionary file + +- For the interest of library size, nlpO3 does not assume what dictionary the developer would like to use. + It does not come with a dictionary. A dictionary is needed for the dictionary-based word tokenizer. +- For tokenization dictionary, try + - [words_th.tx](https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/corpus/words_th.txt) from [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp/) - around 62,000 words (CC0) + - [word break dictionary](https://github.com/tlwg/libthai/tree/master/data) from [libthai](https://github.com/tlwg/libthai/) - consists of dictionaries in different categories, with make script (LGPL-2.1) + + +## Install + +```bash +pip install nlpo3 +``` + +## Usage + +Load file `path/to/dict.file` to memory and assign a name `dict_name` to it. +Then tokenize a text with the `dict_name` dictionary: +```python +from nlpo3 import load_dict, segment + +load_dict("path/to/dict.file", "custom_dict") +segment("สวัสดีครับ", "dict_name") +``` + +it will return a list of strings: +```python +['สวัสดี', 'ครับ'] +``` +(result depends on words included in the dictionary) + +Use multithread mode, also use the `dict_name` dictionary: +```python +segment("สวัสดีครับ", dict_name="dict_name", parallel=True) +``` + +Use safe mode to avoid long waiting time in some edge cases +for text with lots of ambiguous word boundaries: +```python +segment("สวัสดีครับ", dict_name="dict_name", safe=True) +``` + +## Build + +### Requirements + +- [Rust 2018 Edition](https://www.rust-lang.org/tools/install) +- Python 3.6 or newer +- Python Development Headers + - Ubuntu: `sudo apt-get install python3-dev` + - macOS: No action needed +- [PyO3](https://github.com/PyO3/pyo3) - already included in Cargo.toml +- [setuptools-rust](https://github.com/PyO3/setuptools-rust) + +### Steps + +```bash +python -m pip install --upgrade build +python -m build +``` + +This should generate a wheel file, in `dist/` directory, which can be installed by pip. + +## Issues + +Please report issues at https://github.com/PyThaiNLP/nlpo3/issues + + +%package -n python3-nlpo3 +Summary: Python binding for nlpO3 Thai language processing library in Rust +Provides: python-nlpo3 +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +BuildRequires: python3-cffi +BuildRequires: gcc +BuildRequires: gdb +%description -n python3-nlpo3 +<a href="https://pypi.python.org/pypi/nlpo3"><img alt="pypi" src="https://img.shields.io/pypi/v/nlpo3.svg"/></a> +<a href="https://www.python.org/downloads/release/python-360/"><img alt="Python 3.6" src="https://img.shields.io/badge/python-3.6-blue.svg"/></a> +<a href="https://opensource.org/licenses/Apache-2.0"><img alt="License" src="https://img.shields.io/badge/License-Apache%202.0-blue.svg"/></a> +<a href="https://pepy.tech/project/nlpo3"><img alt="Downloads" src="https://pepy.tech/badge/nlpo3/month"/></a> + +# nlpO3 Python binding + +Python binding for nlpO3, a Thai natural language processing library in Rust. + +## Features + +- Thai word tokenizer + - `segment()` - use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries + - [2.5x faster](notebooks/nlpo3_segment_benchmarks.ipynb) than similar pure Python implementation (PyThaiNLP's newmm) + - `load_dict()` - load a dictionary from plain text file (one word per line) + + +## Dictionary file + +- For the interest of library size, nlpO3 does not assume what dictionary the developer would like to use. + It does not come with a dictionary. A dictionary is needed for the dictionary-based word tokenizer. +- For tokenization dictionary, try + - [words_th.tx](https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/corpus/words_th.txt) from [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp/) - around 62,000 words (CC0) + - [word break dictionary](https://github.com/tlwg/libthai/tree/master/data) from [libthai](https://github.com/tlwg/libthai/) - consists of dictionaries in different categories, with make script (LGPL-2.1) + + +## Install + +```bash +pip install nlpo3 +``` + +## Usage + +Load file `path/to/dict.file` to memory and assign a name `dict_name` to it. +Then tokenize a text with the `dict_name` dictionary: +```python +from nlpo3 import load_dict, segment + +load_dict("path/to/dict.file", "custom_dict") +segment("สวัสดีครับ", "dict_name") +``` + +it will return a list of strings: +```python +['สวัสดี', 'ครับ'] +``` +(result depends on words included in the dictionary) + +Use multithread mode, also use the `dict_name` dictionary: +```python +segment("สวัสดีครับ", dict_name="dict_name", parallel=True) +``` + +Use safe mode to avoid long waiting time in some edge cases +for text with lots of ambiguous word boundaries: +```python +segment("สวัสดีครับ", dict_name="dict_name", safe=True) +``` + +## Build + +### Requirements + +- [Rust 2018 Edition](https://www.rust-lang.org/tools/install) +- Python 3.6 or newer +- Python Development Headers + - Ubuntu: `sudo apt-get install python3-dev` + - macOS: No action needed +- [PyO3](https://github.com/PyO3/pyo3) - already included in Cargo.toml +- [setuptools-rust](https://github.com/PyO3/setuptools-rust) + +### Steps + +```bash +python -m pip install --upgrade build +python -m build +``` + +This should generate a wheel file, in `dist/` directory, which can be installed by pip. + +## Issues + +Please report issues at https://github.com/PyThaiNLP/nlpo3/issues + + +%package help +Summary: Development documents and examples for nlpo3 +Provides: python3-nlpo3-doc +%description help +<a href="https://pypi.python.org/pypi/nlpo3"><img alt="pypi" src="https://img.shields.io/pypi/v/nlpo3.svg"/></a> +<a href="https://www.python.org/downloads/release/python-360/"><img alt="Python 3.6" src="https://img.shields.io/badge/python-3.6-blue.svg"/></a> +<a href="https://opensource.org/licenses/Apache-2.0"><img alt="License" src="https://img.shields.io/badge/License-Apache%202.0-blue.svg"/></a> +<a href="https://pepy.tech/project/nlpo3"><img alt="Downloads" src="https://pepy.tech/badge/nlpo3/month"/></a> + +# nlpO3 Python binding + +Python binding for nlpO3, a Thai natural language processing library in Rust. + +## Features + +- Thai word tokenizer + - `segment()` - use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries + - [2.5x faster](notebooks/nlpo3_segment_benchmarks.ipynb) than similar pure Python implementation (PyThaiNLP's newmm) + - `load_dict()` - load a dictionary from plain text file (one word per line) + + +## Dictionary file + +- For the interest of library size, nlpO3 does not assume what dictionary the developer would like to use. + It does not come with a dictionary. A dictionary is needed for the dictionary-based word tokenizer. +- For tokenization dictionary, try + - [words_th.tx](https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/corpus/words_th.txt) from [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp/) - around 62,000 words (CC0) + - [word break dictionary](https://github.com/tlwg/libthai/tree/master/data) from [libthai](https://github.com/tlwg/libthai/) - consists of dictionaries in different categories, with make script (LGPL-2.1) + + +## Install + +```bash +pip install nlpo3 +``` + +## Usage + +Load file `path/to/dict.file` to memory and assign a name `dict_name` to it. +Then tokenize a text with the `dict_name` dictionary: +```python +from nlpo3 import load_dict, segment + +load_dict("path/to/dict.file", "custom_dict") +segment("สวัสดีครับ", "dict_name") +``` + +it will return a list of strings: +```python +['สวัสดี', 'ครับ'] +``` +(result depends on words included in the dictionary) + +Use multithread mode, also use the `dict_name` dictionary: +```python +segment("สวัสดีครับ", dict_name="dict_name", parallel=True) +``` + +Use safe mode to avoid long waiting time in some edge cases +for text with lots of ambiguous word boundaries: +```python +segment("สวัสดีครับ", dict_name="dict_name", safe=True) +``` + +## Build + +### Requirements + +- [Rust 2018 Edition](https://www.rust-lang.org/tools/install) +- Python 3.6 or newer +- Python Development Headers + - Ubuntu: `sudo apt-get install python3-dev` + - macOS: No action needed +- [PyO3](https://github.com/PyO3/pyo3) - already included in Cargo.toml +- [setuptools-rust](https://github.com/PyO3/setuptools-rust) + +### Steps + +```bash +python -m pip install --upgrade build +python -m build +``` + +This should generate a wheel file, in `dist/` directory, which can be installed by pip. + +## Issues + +Please report issues at https://github.com/PyThaiNLP/nlpo3/issues + + +%prep +%autosetup -n nlpo3-1.2.6 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-nlpo3 -f filelist.lst +%dir %{python3_sitearch}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Wed Apr 12 2023 Python_Bot <Python_Bot@openeuler.org> - 1.2.6-1 +- Package Spec generated |