%global _empty_manifest_terminate_build 0 Name: python-nlpo3 Version: 1.2.6 Release: 1 Summary: Python binding for nlpO3 Thai language processing library in Rust License: Apache-2.0 URL: https://github.com/PyThaiNLP/nlpo3/ Source0: https://mirrors.nju.edu.cn/pypi/web/packages/d6/3b/1514977f53c0e8e86b1fce6c9523cd7e248e729864505c7475bc13595700/nlpo3-1.2.6.tar.gz %description pypi Python 3.6 License Downloads # nlpO3 Python binding Python binding for nlpO3, a Thai natural language processing library in Rust. ## Features - Thai word tokenizer - `segment()` - use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries - [2.5x faster](notebooks/nlpo3_segment_benchmarks.ipynb) than similar pure Python implementation (PyThaiNLP's newmm) - `load_dict()` - load a dictionary from plain text file (one word per line) ## Dictionary file - For the interest of library size, nlpO3 does not assume what dictionary the developer would like to use. It does not come with a dictionary. A dictionary is needed for the dictionary-based word tokenizer. - For tokenization dictionary, try - [words_th.tx](https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/corpus/words_th.txt) from [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp/) - around 62,000 words (CC0) - [word break dictionary](https://github.com/tlwg/libthai/tree/master/data) from [libthai](https://github.com/tlwg/libthai/) - consists of dictionaries in different categories, with make script (LGPL-2.1) ## Install ```bash pip install nlpo3 ``` ## Usage Load file `path/to/dict.file` to memory and assign a name `dict_name` to it. Then tokenize a text with the `dict_name` dictionary: ```python from nlpo3 import load_dict, segment load_dict("path/to/dict.file", "custom_dict") segment("สวัสดีครับ", "dict_name") ``` it will return a list of strings: ```python ['สวัสดี', 'ครับ'] ``` (result depends on words included in the dictionary) Use multithread mode, also use the `dict_name` dictionary: ```python segment("สวัสดีครับ", dict_name="dict_name", parallel=True) ``` Use safe mode to avoid long waiting time in some edge cases for text with lots of ambiguous word boundaries: ```python segment("สวัสดีครับ", dict_name="dict_name", safe=True) ``` ## Build ### Requirements - [Rust 2018 Edition](https://www.rust-lang.org/tools/install) - Python 3.6 or newer - Python Development Headers - Ubuntu: `sudo apt-get install python3-dev` - macOS: No action needed - [PyO3](https://github.com/PyO3/pyo3) - already included in Cargo.toml - [setuptools-rust](https://github.com/PyO3/setuptools-rust) ### Steps ```bash python -m pip install --upgrade build python -m build ``` This should generate a wheel file, in `dist/` directory, which can be installed by pip. ## Issues Please report issues at https://github.com/PyThaiNLP/nlpo3/issues %package -n python3-nlpo3 Summary: Python binding for nlpO3 Thai language processing library in Rust Provides: python-nlpo3 BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip BuildRequires: python3-cffi BuildRequires: gcc BuildRequires: gdb %description -n python3-nlpo3 pypi Python 3.6 License Downloads # nlpO3 Python binding Python binding for nlpO3, a Thai natural language processing library in Rust. ## Features - Thai word tokenizer - `segment()` - use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries - [2.5x faster](notebooks/nlpo3_segment_benchmarks.ipynb) than similar pure Python implementation (PyThaiNLP's newmm) - `load_dict()` - load a dictionary from plain text file (one word per line) ## Dictionary file - For the interest of library size, nlpO3 does not assume what dictionary the developer would like to use. It does not come with a dictionary. A dictionary is needed for the dictionary-based word tokenizer. - For tokenization dictionary, try - [words_th.tx](https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/corpus/words_th.txt) from [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp/) - around 62,000 words (CC0) - [word break dictionary](https://github.com/tlwg/libthai/tree/master/data) from [libthai](https://github.com/tlwg/libthai/) - consists of dictionaries in different categories, with make script (LGPL-2.1) ## Install ```bash pip install nlpo3 ``` ## Usage Load file `path/to/dict.file` to memory and assign a name `dict_name` to it. Then tokenize a text with the `dict_name` dictionary: ```python from nlpo3 import load_dict, segment load_dict("path/to/dict.file", "custom_dict") segment("สวัสดีครับ", "dict_name") ``` it will return a list of strings: ```python ['สวัสดี', 'ครับ'] ``` (result depends on words included in the dictionary) Use multithread mode, also use the `dict_name` dictionary: ```python segment("สวัสดีครับ", dict_name="dict_name", parallel=True) ``` Use safe mode to avoid long waiting time in some edge cases for text with lots of ambiguous word boundaries: ```python segment("สวัสดีครับ", dict_name="dict_name", safe=True) ``` ## Build ### Requirements - [Rust 2018 Edition](https://www.rust-lang.org/tools/install) - Python 3.6 or newer - Python Development Headers - Ubuntu: `sudo apt-get install python3-dev` - macOS: No action needed - [PyO3](https://github.com/PyO3/pyo3) - already included in Cargo.toml - [setuptools-rust](https://github.com/PyO3/setuptools-rust) ### Steps ```bash python -m pip install --upgrade build python -m build ``` This should generate a wheel file, in `dist/` directory, which can be installed by pip. ## Issues Please report issues at https://github.com/PyThaiNLP/nlpo3/issues %package help Summary: Development documents and examples for nlpo3 Provides: python3-nlpo3-doc %description help pypi Python 3.6 License Downloads # nlpO3 Python binding Python binding for nlpO3, a Thai natural language processing library in Rust. ## Features - Thai word tokenizer - `segment()` - use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries - [2.5x faster](notebooks/nlpo3_segment_benchmarks.ipynb) than similar pure Python implementation (PyThaiNLP's newmm) - `load_dict()` - load a dictionary from plain text file (one word per line) ## Dictionary file - For the interest of library size, nlpO3 does not assume what dictionary the developer would like to use. It does not come with a dictionary. A dictionary is needed for the dictionary-based word tokenizer. - For tokenization dictionary, try - [words_th.tx](https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/corpus/words_th.txt) from [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp/) - around 62,000 words (CC0) - [word break dictionary](https://github.com/tlwg/libthai/tree/master/data) from [libthai](https://github.com/tlwg/libthai/) - consists of dictionaries in different categories, with make script (LGPL-2.1) ## Install ```bash pip install nlpo3 ``` ## Usage Load file `path/to/dict.file` to memory and assign a name `dict_name` to it. Then tokenize a text with the `dict_name` dictionary: ```python from nlpo3 import load_dict, segment load_dict("path/to/dict.file", "custom_dict") segment("สวัสดีครับ", "dict_name") ``` it will return a list of strings: ```python ['สวัสดี', 'ครับ'] ``` (result depends on words included in the dictionary) Use multithread mode, also use the `dict_name` dictionary: ```python segment("สวัสดีครับ", dict_name="dict_name", parallel=True) ``` Use safe mode to avoid long waiting time in some edge cases for text with lots of ambiguous word boundaries: ```python segment("สวัสดีครับ", dict_name="dict_name", safe=True) ``` ## Build ### Requirements - [Rust 2018 Edition](https://www.rust-lang.org/tools/install) - Python 3.6 or newer - Python Development Headers - Ubuntu: `sudo apt-get install python3-dev` - macOS: No action needed - [PyO3](https://github.com/PyO3/pyo3) - already included in Cargo.toml - [setuptools-rust](https://github.com/PyO3/setuptools-rust) ### Steps ```bash python -m pip install --upgrade build python -m build ``` This should generate a wheel file, in `dist/` directory, which can be installed by pip. ## Issues Please report issues at https://github.com/PyThaiNLP/nlpo3/issues %prep %autosetup -n nlpo3-1.2.6 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-nlpo3 -f filelist.lst %dir %{python3_sitearch}/* %files help -f doclist.lst %{_docdir}/* %changelog * Wed Apr 12 2023 Python_Bot - 1.2.6-1 - Package Spec generated