summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--.gitignore1
-rw-r--r--python-nlpo3.spec326
-rw-r--r--sources1
3 files changed, 328 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..7d45073 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/nlpo3-1.2.6.tar.gz
diff --git a/python-nlpo3.spec b/python-nlpo3.spec
new file mode 100644
index 0000000..bb6dcae
--- /dev/null
+++ b/python-nlpo3.spec
@@ -0,0 +1,326 @@
+%global _empty_manifest_terminate_build 0
+Name: python-nlpo3
+Version: 1.2.6
+Release: 1
+Summary: Python binding for nlpO3 Thai language processing library in Rust
+License: Apache-2.0
+URL: https://github.com/PyThaiNLP/nlpo3/
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/d6/3b/1514977f53c0e8e86b1fce6c9523cd7e248e729864505c7475bc13595700/nlpo3-1.2.6.tar.gz
+
+
+%description
+<a href="https://pypi.python.org/pypi/nlpo3"><img alt="pypi" src="https://img.shields.io/pypi/v/nlpo3.svg"/></a>
+<a href="https://www.python.org/downloads/release/python-360/"><img alt="Python 3.6" src="https://img.shields.io/badge/python-3.6-blue.svg"/></a>
+<a href="https://opensource.org/licenses/Apache-2.0"><img alt="License" src="https://img.shields.io/badge/License-Apache%202.0-blue.svg"/></a>
+<a href="https://pepy.tech/project/nlpo3"><img alt="Downloads" src="https://pepy.tech/badge/nlpo3/month"/></a>
+
+# nlpO3 Python binding
+
+Python binding for nlpO3, a Thai natural language processing library in Rust.
+
+## Features
+
+- Thai word tokenizer
+ - `segment()` - use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries
+ - [2.5x faster](notebooks/nlpo3_segment_benchmarks.ipynb) than similar pure Python implementation (PyThaiNLP's newmm)
+ - `load_dict()` - load a dictionary from plain text file (one word per line)
+
+
+## Dictionary file
+
+- For the interest of library size, nlpO3 does not assume what dictionary the developer would like to use.
+ It does not come with a dictionary. A dictionary is needed for the dictionary-based word tokenizer.
+- For tokenization dictionary, try
+ - [words_th.tx](https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/corpus/words_th.txt) from [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp/) - around 62,000 words (CC0)
+ - [word break dictionary](https://github.com/tlwg/libthai/tree/master/data) from [libthai](https://github.com/tlwg/libthai/) - consists of dictionaries in different categories, with make script (LGPL-2.1)
+
+
+## Install
+
+```bash
+pip install nlpo3
+```
+
+## Usage
+
+Load file `path/to/dict.file` to memory and assign a name `dict_name` to it.
+Then tokenize a text with the `dict_name` dictionary:
+```python
+from nlpo3 import load_dict, segment
+
+load_dict("path/to/dict.file", "custom_dict")
+segment("สวัสดีครับ", "dict_name")
+```
+
+it will return a list of strings:
+```python
+['สวัสดี', 'ครับ']
+```
+(result depends on words included in the dictionary)
+
+Use multithread mode, also use the `dict_name` dictionary:
+```python
+segment("สวัสดีครับ", dict_name="dict_name", parallel=True)
+```
+
+Use safe mode to avoid long waiting time in some edge cases
+for text with lots of ambiguous word boundaries:
+```python
+segment("สวัสดีครับ", dict_name="dict_name", safe=True)
+```
+
+## Build
+
+### Requirements
+
+- [Rust 2018 Edition](https://www.rust-lang.org/tools/install)
+- Python 3.6 or newer
+- Python Development Headers
+ - Ubuntu: `sudo apt-get install python3-dev`
+ - macOS: No action needed
+- [PyO3](https://github.com/PyO3/pyo3) - already included in Cargo.toml
+- [setuptools-rust](https://github.com/PyO3/setuptools-rust)
+
+### Steps
+
+```bash
+python -m pip install --upgrade build
+python -m build
+```
+
+This should generate a wheel file, in `dist/` directory, which can be installed by pip.
+
+## Issues
+
+Please report issues at https://github.com/PyThaiNLP/nlpo3/issues
+
+
+%package -n python3-nlpo3
+Summary: Python binding for nlpO3 Thai language processing library in Rust
+Provides: python-nlpo3
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+BuildRequires: python3-cffi
+BuildRequires: gcc
+BuildRequires: gdb
+%description -n python3-nlpo3
+<a href="https://pypi.python.org/pypi/nlpo3"><img alt="pypi" src="https://img.shields.io/pypi/v/nlpo3.svg"/></a>
+<a href="https://www.python.org/downloads/release/python-360/"><img alt="Python 3.6" src="https://img.shields.io/badge/python-3.6-blue.svg"/></a>
+<a href="https://opensource.org/licenses/Apache-2.0"><img alt="License" src="https://img.shields.io/badge/License-Apache%202.0-blue.svg"/></a>
+<a href="https://pepy.tech/project/nlpo3"><img alt="Downloads" src="https://pepy.tech/badge/nlpo3/month"/></a>
+
+# nlpO3 Python binding
+
+Python binding for nlpO3, a Thai natural language processing library in Rust.
+
+## Features
+
+- Thai word tokenizer
+ - `segment()` - use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries
+ - [2.5x faster](notebooks/nlpo3_segment_benchmarks.ipynb) than similar pure Python implementation (PyThaiNLP's newmm)
+ - `load_dict()` - load a dictionary from plain text file (one word per line)
+
+
+## Dictionary file
+
+- For the interest of library size, nlpO3 does not assume what dictionary the developer would like to use.
+ It does not come with a dictionary. A dictionary is needed for the dictionary-based word tokenizer.
+- For tokenization dictionary, try
+ - [words_th.tx](https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/corpus/words_th.txt) from [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp/) - around 62,000 words (CC0)
+ - [word break dictionary](https://github.com/tlwg/libthai/tree/master/data) from [libthai](https://github.com/tlwg/libthai/) - consists of dictionaries in different categories, with make script (LGPL-2.1)
+
+
+## Install
+
+```bash
+pip install nlpo3
+```
+
+## Usage
+
+Load file `path/to/dict.file` to memory and assign a name `dict_name` to it.
+Then tokenize a text with the `dict_name` dictionary:
+```python
+from nlpo3 import load_dict, segment
+
+load_dict("path/to/dict.file", "custom_dict")
+segment("สวัสดีครับ", "dict_name")
+```
+
+it will return a list of strings:
+```python
+['สวัสดี', 'ครับ']
+```
+(result depends on words included in the dictionary)
+
+Use multithread mode, also use the `dict_name` dictionary:
+```python
+segment("สวัสดีครับ", dict_name="dict_name", parallel=True)
+```
+
+Use safe mode to avoid long waiting time in some edge cases
+for text with lots of ambiguous word boundaries:
+```python
+segment("สวัสดีครับ", dict_name="dict_name", safe=True)
+```
+
+## Build
+
+### Requirements
+
+- [Rust 2018 Edition](https://www.rust-lang.org/tools/install)
+- Python 3.6 or newer
+- Python Development Headers
+ - Ubuntu: `sudo apt-get install python3-dev`
+ - macOS: No action needed
+- [PyO3](https://github.com/PyO3/pyo3) - already included in Cargo.toml
+- [setuptools-rust](https://github.com/PyO3/setuptools-rust)
+
+### Steps
+
+```bash
+python -m pip install --upgrade build
+python -m build
+```
+
+This should generate a wheel file, in `dist/` directory, which can be installed by pip.
+
+## Issues
+
+Please report issues at https://github.com/PyThaiNLP/nlpo3/issues
+
+
+%package help
+Summary: Development documents and examples for nlpo3
+Provides: python3-nlpo3-doc
+%description help
+<a href="https://pypi.python.org/pypi/nlpo3"><img alt="pypi" src="https://img.shields.io/pypi/v/nlpo3.svg"/></a>
+<a href="https://www.python.org/downloads/release/python-360/"><img alt="Python 3.6" src="https://img.shields.io/badge/python-3.6-blue.svg"/></a>
+<a href="https://opensource.org/licenses/Apache-2.0"><img alt="License" src="https://img.shields.io/badge/License-Apache%202.0-blue.svg"/></a>
+<a href="https://pepy.tech/project/nlpo3"><img alt="Downloads" src="https://pepy.tech/badge/nlpo3/month"/></a>
+
+# nlpO3 Python binding
+
+Python binding for nlpO3, a Thai natural language processing library in Rust.
+
+## Features
+
+- Thai word tokenizer
+ - `segment()` - use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries
+ - [2.5x faster](notebooks/nlpo3_segment_benchmarks.ipynb) than similar pure Python implementation (PyThaiNLP's newmm)
+ - `load_dict()` - load a dictionary from plain text file (one word per line)
+
+
+## Dictionary file
+
+- For the interest of library size, nlpO3 does not assume what dictionary the developer would like to use.
+ It does not come with a dictionary. A dictionary is needed for the dictionary-based word tokenizer.
+- For tokenization dictionary, try
+ - [words_th.tx](https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/corpus/words_th.txt) from [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp/) - around 62,000 words (CC0)
+ - [word break dictionary](https://github.com/tlwg/libthai/tree/master/data) from [libthai](https://github.com/tlwg/libthai/) - consists of dictionaries in different categories, with make script (LGPL-2.1)
+
+
+## Install
+
+```bash
+pip install nlpo3
+```
+
+## Usage
+
+Load file `path/to/dict.file` to memory and assign a name `dict_name` to it.
+Then tokenize a text with the `dict_name` dictionary:
+```python
+from nlpo3 import load_dict, segment
+
+load_dict("path/to/dict.file", "custom_dict")
+segment("สวัสดีครับ", "dict_name")
+```
+
+it will return a list of strings:
+```python
+['สวัสดี', 'ครับ']
+```
+(result depends on words included in the dictionary)
+
+Use multithread mode, also use the `dict_name` dictionary:
+```python
+segment("สวัสดีครับ", dict_name="dict_name", parallel=True)
+```
+
+Use safe mode to avoid long waiting time in some edge cases
+for text with lots of ambiguous word boundaries:
+```python
+segment("สวัสดีครับ", dict_name="dict_name", safe=True)
+```
+
+## Build
+
+### Requirements
+
+- [Rust 2018 Edition](https://www.rust-lang.org/tools/install)
+- Python 3.6 or newer
+- Python Development Headers
+ - Ubuntu: `sudo apt-get install python3-dev`
+ - macOS: No action needed
+- [PyO3](https://github.com/PyO3/pyo3) - already included in Cargo.toml
+- [setuptools-rust](https://github.com/PyO3/setuptools-rust)
+
+### Steps
+
+```bash
+python -m pip install --upgrade build
+python -m build
+```
+
+This should generate a wheel file, in `dist/` directory, which can be installed by pip.
+
+## Issues
+
+Please report issues at https://github.com/PyThaiNLP/nlpo3/issues
+
+
+%prep
+%autosetup -n nlpo3-1.2.6
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-nlpo3 -f filelist.lst
+%dir %{python3_sitearch}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Wed Apr 12 2023 Python_Bot <Python_Bot@openeuler.org> - 1.2.6-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..c218bb8
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+f6d74d2db4e73f47ea68acbe2776575c nlpo3-1.2.6.tar.gz