%global _empty_manifest_terminate_build 0
Name: python-nlpo3
Version: 1.2.6
Release: 1
Summary: Python binding for nlpO3 Thai language processing library in Rust
License: Apache-2.0
URL: https://github.com/PyThaiNLP/nlpo3/
Source0: https://mirrors.nju.edu.cn/pypi/web/packages/d6/3b/1514977f53c0e8e86b1fce6c9523cd7e248e729864505c7475bc13595700/nlpo3-1.2.6.tar.gz
%description
# nlpO3 Python binding
Python binding for nlpO3, a Thai natural language processing library in Rust.
## Features
- Thai word tokenizer
- `segment()` - use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries
- [2.5x faster](notebooks/nlpo3_segment_benchmarks.ipynb) than similar pure Python implementation (PyThaiNLP's newmm)
- `load_dict()` - load a dictionary from plain text file (one word per line)
## Dictionary file
- For the interest of library size, nlpO3 does not assume what dictionary the developer would like to use.
It does not come with a dictionary. A dictionary is needed for the dictionary-based word tokenizer.
- For tokenization dictionary, try
- [words_th.tx](https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/corpus/words_th.txt) from [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp/) - around 62,000 words (CC0)
- [word break dictionary](https://github.com/tlwg/libthai/tree/master/data) from [libthai](https://github.com/tlwg/libthai/) - consists of dictionaries in different categories, with make script (LGPL-2.1)
## Install
```bash
pip install nlpo3
```
## Usage
Load file `path/to/dict.file` to memory and assign a name `dict_name` to it.
Then tokenize a text with the `dict_name` dictionary:
```python
from nlpo3 import load_dict, segment
load_dict("path/to/dict.file", "custom_dict")
segment("สวัสดีครับ", "dict_name")
```
it will return a list of strings:
```python
['สวัสดี', 'ครับ']
```
(result depends on words included in the dictionary)
Use multithread mode, also use the `dict_name` dictionary:
```python
segment("สวัสดีครับ", dict_name="dict_name", parallel=True)
```
Use safe mode to avoid long waiting time in some edge cases
for text with lots of ambiguous word boundaries:
```python
segment("สวัสดีครับ", dict_name="dict_name", safe=True)
```
## Build
### Requirements
- [Rust 2018 Edition](https://www.rust-lang.org/tools/install)
- Python 3.6 or newer
- Python Development Headers
- Ubuntu: `sudo apt-get install python3-dev`
- macOS: No action needed
- [PyO3](https://github.com/PyO3/pyo3) - already included in Cargo.toml
- [setuptools-rust](https://github.com/PyO3/setuptools-rust)
### Steps
```bash
python -m pip install --upgrade build
python -m build
```
This should generate a wheel file, in `dist/` directory, which can be installed by pip.
## Issues
Please report issues at https://github.com/PyThaiNLP/nlpo3/issues
%package -n python3-nlpo3
Summary: Python binding for nlpO3 Thai language processing library in Rust
Provides: python-nlpo3
BuildRequires: python3-devel
BuildRequires: python3-setuptools
BuildRequires: python3-pip
BuildRequires: python3-cffi
BuildRequires: gcc
BuildRequires: gdb
%description -n python3-nlpo3
# nlpO3 Python binding
Python binding for nlpO3, a Thai natural language processing library in Rust.
## Features
- Thai word tokenizer
- `segment()` - use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries
- [2.5x faster](notebooks/nlpo3_segment_benchmarks.ipynb) than similar pure Python implementation (PyThaiNLP's newmm)
- `load_dict()` - load a dictionary from plain text file (one word per line)
## Dictionary file
- For the interest of library size, nlpO3 does not assume what dictionary the developer would like to use.
It does not come with a dictionary. A dictionary is needed for the dictionary-based word tokenizer.
- For tokenization dictionary, try
- [words_th.tx](https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/corpus/words_th.txt) from [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp/) - around 62,000 words (CC0)
- [word break dictionary](https://github.com/tlwg/libthai/tree/master/data) from [libthai](https://github.com/tlwg/libthai/) - consists of dictionaries in different categories, with make script (LGPL-2.1)
## Install
```bash
pip install nlpo3
```
## Usage
Load file `path/to/dict.file` to memory and assign a name `dict_name` to it.
Then tokenize a text with the `dict_name` dictionary:
```python
from nlpo3 import load_dict, segment
load_dict("path/to/dict.file", "custom_dict")
segment("สวัสดีครับ", "dict_name")
```
it will return a list of strings:
```python
['สวัสดี', 'ครับ']
```
(result depends on words included in the dictionary)
Use multithread mode, also use the `dict_name` dictionary:
```python
segment("สวัสดีครับ", dict_name="dict_name", parallel=True)
```
Use safe mode to avoid long waiting time in some edge cases
for text with lots of ambiguous word boundaries:
```python
segment("สวัสดีครับ", dict_name="dict_name", safe=True)
```
## Build
### Requirements
- [Rust 2018 Edition](https://www.rust-lang.org/tools/install)
- Python 3.6 or newer
- Python Development Headers
- Ubuntu: `sudo apt-get install python3-dev`
- macOS: No action needed
- [PyO3](https://github.com/PyO3/pyo3) - already included in Cargo.toml
- [setuptools-rust](https://github.com/PyO3/setuptools-rust)
### Steps
```bash
python -m pip install --upgrade build
python -m build
```
This should generate a wheel file, in `dist/` directory, which can be installed by pip.
## Issues
Please report issues at https://github.com/PyThaiNLP/nlpo3/issues
%package help
Summary: Development documents and examples for nlpo3
Provides: python3-nlpo3-doc
%description help
# nlpO3 Python binding
Python binding for nlpO3, a Thai natural language processing library in Rust.
## Features
- Thai word tokenizer
- `segment()` - use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries
- [2.5x faster](notebooks/nlpo3_segment_benchmarks.ipynb) than similar pure Python implementation (PyThaiNLP's newmm)
- `load_dict()` - load a dictionary from plain text file (one word per line)
## Dictionary file
- For the interest of library size, nlpO3 does not assume what dictionary the developer would like to use.
It does not come with a dictionary. A dictionary is needed for the dictionary-based word tokenizer.
- For tokenization dictionary, try
- [words_th.tx](https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/corpus/words_th.txt) from [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp/) - around 62,000 words (CC0)
- [word break dictionary](https://github.com/tlwg/libthai/tree/master/data) from [libthai](https://github.com/tlwg/libthai/) - consists of dictionaries in different categories, with make script (LGPL-2.1)
## Install
```bash
pip install nlpo3
```
## Usage
Load file `path/to/dict.file` to memory and assign a name `dict_name` to it.
Then tokenize a text with the `dict_name` dictionary:
```python
from nlpo3 import load_dict, segment
load_dict("path/to/dict.file", "custom_dict")
segment("สวัสดีครับ", "dict_name")
```
it will return a list of strings:
```python
['สวัสดี', 'ครับ']
```
(result depends on words included in the dictionary)
Use multithread mode, also use the `dict_name` dictionary:
```python
segment("สวัสดีครับ", dict_name="dict_name", parallel=True)
```
Use safe mode to avoid long waiting time in some edge cases
for text with lots of ambiguous word boundaries:
```python
segment("สวัสดีครับ", dict_name="dict_name", safe=True)
```
## Build
### Requirements
- [Rust 2018 Edition](https://www.rust-lang.org/tools/install)
- Python 3.6 or newer
- Python Development Headers
- Ubuntu: `sudo apt-get install python3-dev`
- macOS: No action needed
- [PyO3](https://github.com/PyO3/pyo3) - already included in Cargo.toml
- [setuptools-rust](https://github.com/PyO3/setuptools-rust)
### Steps
```bash
python -m pip install --upgrade build
python -m build
```
This should generate a wheel file, in `dist/` directory, which can be installed by pip.
## Issues
Please report issues at https://github.com/PyThaiNLP/nlpo3/issues
%prep
%autosetup -n nlpo3-1.2.6
%build
%py3_build
%install
%py3_install
install -d -m755 %{buildroot}/%{_pkgdocdir}
if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
pushd %{buildroot}
if [ -d usr/lib ]; then
find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/lib64 ]; then
find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/bin ]; then
find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
fi
if [ -d usr/sbin ]; then
find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
fi
touch doclist.lst
if [ -d usr/share/man ]; then
find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
fi
popd
mv %{buildroot}/filelist.lst .
mv %{buildroot}/doclist.lst .
%files -n python3-nlpo3 -f filelist.lst
%dir %{python3_sitearch}/*
%files help -f doclist.lst
%{_docdir}/*
%changelog
* Wed Apr 12 2023 Python_Bot - 1.2.6-1
- Package Spec generated