diff options
-rw-r--r-- | .gitignore | 1 | ||||
-rw-r--r-- | python-udkanbun.spec | 405 | ||||
-rw-r--r-- | sources | 1 |
3 files changed, 407 insertions, 0 deletions
@@ -0,0 +1 @@ +/udkanbun-3.3.7.tar.gz diff --git a/python-udkanbun.spec b/python-udkanbun.spec new file mode 100644 index 0000000..dadc452 --- /dev/null +++ b/python-udkanbun.spec @@ -0,0 +1,405 @@ +%global _empty_manifest_terminate_build 0 +Name: python-udkanbun +Version: 3.3.7 +Release: 1 +Summary: Tokenizer POS-tagger and Dependency-parser for Classical Chinese +License: MIT +URL: https://github.com/KoichiYasuoka/UD-Kanbun +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/b8/fa/1725e688be150918a02652c9551543bd248c312840771913dee1a3ed6249/udkanbun-3.3.7.tar.gz +BuildArch: noarch + + +%description +[](https://pypi.org/project/udkanbun/) + +# UD-Kanbun + +Tokenizer, POS-Tagger, and Dependency-Parser for Classical Chinese Texts (漢文/文言文), working on [Universal Dependencies](https://universaldependencies.org/format.html). + +## Basic usage + +```py +>>> import udkanbun +>>> lzh=udkanbun.load() +>>> s=lzh("不入虎穴不得虎子") +>>> print(s) +# text = 不入虎穴不得虎子 +1 不 不 ADV v,副詞,否定,無界 Polarity=Neg 2 advmod _ Gloss=not|SpaceAfter=No +2 入 入 VERB v,動詞,行為,移動 _ 0 root _ Gloss=enter|SpaceAfter=No +3 虎 虎 NOUN n,名詞,主体,動物 _ 4 nmod _ Gloss=tiger|SpaceAfter=No +4 穴 穴 NOUN n,名詞,固定物,地形 Case=Loc 2 obj _ Gloss=cave|SpaceAfter=No +5 不 不 ADV v,副詞,否定,無界 Polarity=Neg 6 advmod _ Gloss=not|SpaceAfter=No +6 得 得 VERB v,動詞,行為,得失 _ 2 parataxis _ Gloss=get|SpaceAfter=No +7 虎 虎 NOUN n,名詞,主体,動物 _ 8 nmod _ Gloss=tiger|SpaceAfter=No +8 子 子 NOUN n,名詞,人,関係 _ 6 obj _ Gloss=child|SpaceAfter=No + +>>> t=s[1] +>>> print(t.id,t.form,t.lemma,t.upos,t.xpos,t.feats,t.head.id,t.deprel,t.deps,t.misc) +1 不 不 ADV v,副詞,否定,無界 Polarity=Neg 2 advmod _ Gloss=not|SpaceAfter=No + +>>> print(s.kaeriten()) +不㆑入㆓虎穴㆒不㆑得㆓虎子㆒ + +>>> print(s.to_tree()) +不 <════╗ advmod +入 ═══╗═╝═╗ root +虎 <╗ ║ ║ nmod +穴 ═╝<╝ ║ obj +不 <════╗ ║ advmod +得 ═══╗═╝<╝ parataxis +虎 <╗ ║ nmod +子 ═╝<╝ obj + +>>> f=open("trial.svg","w") +>>> f.write(s.to_svg()) +>>> f.close() +``` + +`udkanbun.load()` has three options `udkanbun.load(MeCab=True,Danku=False)`. By default, the UD-Kanbun pipeline uses [MeCab](https://taku910.github.io/mecab/) for tokenizer and POS-tagger, then uses [UDPipe](http://ufal.mff.cuni.cz/udpipe) for dependency-parser. With the option `MeCab=False` the pipeline uses UDPipe for all through the processing. With the option `Danku=True` the pipeline tries to segment sentences automatically. + +`udkanbun.UDKanbunEntry.to_tree()` has an option `to_tree(BoxDrawingWidth=2)` for old terminals, whose Box Drawing characters are "fullwidth". `to_tree(kaeriten=True,Japanese=True)` is convenient for Japanese users. + +You can simply use `udkanbun` on the command line: +```sh +echo 不入虎穴不得虎子 | udkanbun +``` + +## Usage via spaCy + +If you have already installed [spaCy](https://pypi.org/project/spacy/) 2.1.0 or later, you can use UD-Kanbun via spaCy Language pipeline. + +```py +>>> import udkanbun.spacy +>>> lzh=udkanbun.spacy.load() +>>> d=lzh("不入虎穴不得虎子") +>>> print(type(d)) +<class 'spacy.tokens.doc.Doc'> +>>> print(udkanbun.spacy.to_conllu(d)) +# text = 不入虎穴不得虎子 +1 不 不 ADV v,副詞,否定,無界 _ 2 advmod _ Gloss=not|SpaceAfter=No +2 入 入 VERB v,動詞,行為,移動 _ 0 root _ Gloss=enter|SpaceAfter=No +3 虎 虎 NOUN n,名詞,主体,動物 _ 4 nmod _ Gloss=tiger|SpaceAfter=No +4 穴 穴 NOUN n,名詞,固定物,地形 _ 2 obj _ Gloss=cave|SpaceAfter=No +5 不 不 ADV v,副詞,否定,無界 _ 6 advmod _ Gloss=not|SpaceAfter=No +6 得 得 VERB v,動詞,行為,得失 _ 2 parataxis _ Gloss=get|SpaceAfter=No +7 虎 虎 NOUN n,名詞,主体,動物 _ 8 nmod _ Gloss=tiger|SpaceAfter=No +8 子 子 NOUN n,名詞,人,関係 _ 6 obj _ Gloss=child|SpaceAfter=No + +>>> t=d[0] +>>> print(t.i+1,t.orth_,t.lemma_,t.pos_,t.tag_,t.head.i+1,t.dep_,t.whitespace_,t.norm_) +1 不 不 ADV v,副詞,否定,無界 2 advmod not +``` + +## Installation for Linux + +Tar-ball is available for Linux, and is installed by default when you use `pip`: +```sh +pip install udkanbun +``` + +## Installation for Cygwin + +Make sure to get `gcc-g++` `python37-pip` `python37-devel` packages, and then: +```sh +pip3.7 install udkanbun +``` +Use `python3.7` command in [Cygwin](https://www.cygwin.com/install.html) instead of `python`. + +## Installation for Jupyter Notebook (Google Colaboratory) + +```py +!pip install udkanbun +``` + +Try [notebook](https://colab.research.google.com/github/KoichiYasuoka/UD-Kanbun/blob/master/udkanbun.ipynb) for Google Colaboratory. + +## Author + +Koichi Yasuoka (安岡孝一) + +## References + +* Koichi Yasuoka: [Universal Dependencies Treebank of the Four Books in Classical Chinese](http://hdl.handle.net/2433/245217), DADH2019: 10th International Conference of Digital Archives and Digital Humanities (December 2019), pp.20-28. +* 安岡孝一: [四書を学んだMeCab+UDPipeはセンター試験の漢文を読めるのか](http://hdl.handle.net/2433/237383), 東洋学へのコンピュータ利用, 第30回研究セミナー (2019年3月8日), pp.3-110. +* 安岡孝一: [漢文の依存文法解析と返り点の関係について](http://hdl.handle.net/2433/235609), 日本漢字学会第1回研究大会予稿集 (2018年12月1日), pp.33-48. + +%package -n python3-udkanbun +Summary: Tokenizer POS-tagger and Dependency-parser for Classical Chinese +Provides: python-udkanbun +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-udkanbun +[](https://pypi.org/project/udkanbun/) + +# UD-Kanbun + +Tokenizer, POS-Tagger, and Dependency-Parser for Classical Chinese Texts (漢文/文言文), working on [Universal Dependencies](https://universaldependencies.org/format.html). + +## Basic usage + +```py +>>> import udkanbun +>>> lzh=udkanbun.load() +>>> s=lzh("不入虎穴不得虎子") +>>> print(s) +# text = 不入虎穴不得虎子 +1 不 不 ADV v,副詞,否定,無界 Polarity=Neg 2 advmod _ Gloss=not|SpaceAfter=No +2 入 入 VERB v,動詞,行為,移動 _ 0 root _ Gloss=enter|SpaceAfter=No +3 虎 虎 NOUN n,名詞,主体,動物 _ 4 nmod _ Gloss=tiger|SpaceAfter=No +4 穴 穴 NOUN n,名詞,固定物,地形 Case=Loc 2 obj _ Gloss=cave|SpaceAfter=No +5 不 不 ADV v,副詞,否定,無界 Polarity=Neg 6 advmod _ Gloss=not|SpaceAfter=No +6 得 得 VERB v,動詞,行為,得失 _ 2 parataxis _ Gloss=get|SpaceAfter=No +7 虎 虎 NOUN n,名詞,主体,動物 _ 8 nmod _ Gloss=tiger|SpaceAfter=No +8 子 子 NOUN n,名詞,人,関係 _ 6 obj _ Gloss=child|SpaceAfter=No + +>>> t=s[1] +>>> print(t.id,t.form,t.lemma,t.upos,t.xpos,t.feats,t.head.id,t.deprel,t.deps,t.misc) +1 不 不 ADV v,副詞,否定,無界 Polarity=Neg 2 advmod _ Gloss=not|SpaceAfter=No + +>>> print(s.kaeriten()) +不㆑入㆓虎穴㆒不㆑得㆓虎子㆒ + +>>> print(s.to_tree()) +不 <════╗ advmod +入 ═══╗═╝═╗ root +虎 <╗ ║ ║ nmod +穴 ═╝<╝ ║ obj +不 <════╗ ║ advmod +得 ═══╗═╝<╝ parataxis +虎 <╗ ║ nmod +子 ═╝<╝ obj + +>>> f=open("trial.svg","w") +>>> f.write(s.to_svg()) +>>> f.close() +``` + +`udkanbun.load()` has three options `udkanbun.load(MeCab=True,Danku=False)`. By default, the UD-Kanbun pipeline uses [MeCab](https://taku910.github.io/mecab/) for tokenizer and POS-tagger, then uses [UDPipe](http://ufal.mff.cuni.cz/udpipe) for dependency-parser. With the option `MeCab=False` the pipeline uses UDPipe for all through the processing. With the option `Danku=True` the pipeline tries to segment sentences automatically. + +`udkanbun.UDKanbunEntry.to_tree()` has an option `to_tree(BoxDrawingWidth=2)` for old terminals, whose Box Drawing characters are "fullwidth". `to_tree(kaeriten=True,Japanese=True)` is convenient for Japanese users. + +You can simply use `udkanbun` on the command line: +```sh +echo 不入虎穴不得虎子 | udkanbun +``` + +## Usage via spaCy + +If you have already installed [spaCy](https://pypi.org/project/spacy/) 2.1.0 or later, you can use UD-Kanbun via spaCy Language pipeline. + +```py +>>> import udkanbun.spacy +>>> lzh=udkanbun.spacy.load() +>>> d=lzh("不入虎穴不得虎子") +>>> print(type(d)) +<class 'spacy.tokens.doc.Doc'> +>>> print(udkanbun.spacy.to_conllu(d)) +# text = 不入虎穴不得虎子 +1 不 不 ADV v,副詞,否定,無界 _ 2 advmod _ Gloss=not|SpaceAfter=No +2 入 入 VERB v,動詞,行為,移動 _ 0 root _ Gloss=enter|SpaceAfter=No +3 虎 虎 NOUN n,名詞,主体,動物 _ 4 nmod _ Gloss=tiger|SpaceAfter=No +4 穴 穴 NOUN n,名詞,固定物,地形 _ 2 obj _ Gloss=cave|SpaceAfter=No +5 不 不 ADV v,副詞,否定,無界 _ 6 advmod _ Gloss=not|SpaceAfter=No +6 得 得 VERB v,動詞,行為,得失 _ 2 parataxis _ Gloss=get|SpaceAfter=No +7 虎 虎 NOUN n,名詞,主体,動物 _ 8 nmod _ Gloss=tiger|SpaceAfter=No +8 子 子 NOUN n,名詞,人,関係 _ 6 obj _ Gloss=child|SpaceAfter=No + +>>> t=d[0] +>>> print(t.i+1,t.orth_,t.lemma_,t.pos_,t.tag_,t.head.i+1,t.dep_,t.whitespace_,t.norm_) +1 不 不 ADV v,副詞,否定,無界 2 advmod not +``` + +## Installation for Linux + +Tar-ball is available for Linux, and is installed by default when you use `pip`: +```sh +pip install udkanbun +``` + +## Installation for Cygwin + +Make sure to get `gcc-g++` `python37-pip` `python37-devel` packages, and then: +```sh +pip3.7 install udkanbun +``` +Use `python3.7` command in [Cygwin](https://www.cygwin.com/install.html) instead of `python`. + +## Installation for Jupyter Notebook (Google Colaboratory) + +```py +!pip install udkanbun +``` + +Try [notebook](https://colab.research.google.com/github/KoichiYasuoka/UD-Kanbun/blob/master/udkanbun.ipynb) for Google Colaboratory. + +## Author + +Koichi Yasuoka (安岡孝一) + +## References + +* Koichi Yasuoka: [Universal Dependencies Treebank of the Four Books in Classical Chinese](http://hdl.handle.net/2433/245217), DADH2019: 10th International Conference of Digital Archives and Digital Humanities (December 2019), pp.20-28. +* 安岡孝一: [四書を学んだMeCab+UDPipeはセンター試験の漢文を読めるのか](http://hdl.handle.net/2433/237383), 東洋学へのコンピュータ利用, 第30回研究セミナー (2019年3月8日), pp.3-110. +* 安岡孝一: [漢文の依存文法解析と返り点の関係について](http://hdl.handle.net/2433/235609), 日本漢字学会第1回研究大会予稿集 (2018年12月1日), pp.33-48. + +%package help +Summary: Development documents and examples for udkanbun +Provides: python3-udkanbun-doc +%description help +[](https://pypi.org/project/udkanbun/) + +# UD-Kanbun + +Tokenizer, POS-Tagger, and Dependency-Parser for Classical Chinese Texts (漢文/文言文), working on [Universal Dependencies](https://universaldependencies.org/format.html). + +## Basic usage + +```py +>>> import udkanbun +>>> lzh=udkanbun.load() +>>> s=lzh("不入虎穴不得虎子") +>>> print(s) +# text = 不入虎穴不得虎子 +1 不 不 ADV v,副詞,否定,無界 Polarity=Neg 2 advmod _ Gloss=not|SpaceAfter=No +2 入 入 VERB v,動詞,行為,移動 _ 0 root _ Gloss=enter|SpaceAfter=No +3 虎 虎 NOUN n,名詞,主体,動物 _ 4 nmod _ Gloss=tiger|SpaceAfter=No +4 穴 穴 NOUN n,名詞,固定物,地形 Case=Loc 2 obj _ Gloss=cave|SpaceAfter=No +5 不 不 ADV v,副詞,否定,無界 Polarity=Neg 6 advmod _ Gloss=not|SpaceAfter=No +6 得 得 VERB v,動詞,行為,得失 _ 2 parataxis _ Gloss=get|SpaceAfter=No +7 虎 虎 NOUN n,名詞,主体,動物 _ 8 nmod _ Gloss=tiger|SpaceAfter=No +8 子 子 NOUN n,名詞,人,関係 _ 6 obj _ Gloss=child|SpaceAfter=No + +>>> t=s[1] +>>> print(t.id,t.form,t.lemma,t.upos,t.xpos,t.feats,t.head.id,t.deprel,t.deps,t.misc) +1 不 不 ADV v,副詞,否定,無界 Polarity=Neg 2 advmod _ Gloss=not|SpaceAfter=No + +>>> print(s.kaeriten()) +不㆑入㆓虎穴㆒不㆑得㆓虎子㆒ + +>>> print(s.to_tree()) +不 <════╗ advmod +入 ═══╗═╝═╗ root +虎 <╗ ║ ║ nmod +穴 ═╝<╝ ║ obj +不 <════╗ ║ advmod +得 ═══╗═╝<╝ parataxis +虎 <╗ ║ nmod +子 ═╝<╝ obj + +>>> f=open("trial.svg","w") +>>> f.write(s.to_svg()) +>>> f.close() +``` + +`udkanbun.load()` has three options `udkanbun.load(MeCab=True,Danku=False)`. By default, the UD-Kanbun pipeline uses [MeCab](https://taku910.github.io/mecab/) for tokenizer and POS-tagger, then uses [UDPipe](http://ufal.mff.cuni.cz/udpipe) for dependency-parser. With the option `MeCab=False` the pipeline uses UDPipe for all through the processing. With the option `Danku=True` the pipeline tries to segment sentences automatically. + +`udkanbun.UDKanbunEntry.to_tree()` has an option `to_tree(BoxDrawingWidth=2)` for old terminals, whose Box Drawing characters are "fullwidth". `to_tree(kaeriten=True,Japanese=True)` is convenient for Japanese users. + +You can simply use `udkanbun` on the command line: +```sh +echo 不入虎穴不得虎子 | udkanbun +``` + +## Usage via spaCy + +If you have already installed [spaCy](https://pypi.org/project/spacy/) 2.1.0 or later, you can use UD-Kanbun via spaCy Language pipeline. + +```py +>>> import udkanbun.spacy +>>> lzh=udkanbun.spacy.load() +>>> d=lzh("不入虎穴不得虎子") +>>> print(type(d)) +<class 'spacy.tokens.doc.Doc'> +>>> print(udkanbun.spacy.to_conllu(d)) +# text = 不入虎穴不得虎子 +1 不 不 ADV v,副詞,否定,無界 _ 2 advmod _ Gloss=not|SpaceAfter=No +2 入 入 VERB v,動詞,行為,移動 _ 0 root _ Gloss=enter|SpaceAfter=No +3 虎 虎 NOUN n,名詞,主体,動物 _ 4 nmod _ Gloss=tiger|SpaceAfter=No +4 穴 穴 NOUN n,名詞,固定物,地形 _ 2 obj _ Gloss=cave|SpaceAfter=No +5 不 不 ADV v,副詞,否定,無界 _ 6 advmod _ Gloss=not|SpaceAfter=No +6 得 得 VERB v,動詞,行為,得失 _ 2 parataxis _ Gloss=get|SpaceAfter=No +7 虎 虎 NOUN n,名詞,主体,動物 _ 8 nmod _ Gloss=tiger|SpaceAfter=No +8 子 子 NOUN n,名詞,人,関係 _ 6 obj _ Gloss=child|SpaceAfter=No + +>>> t=d[0] +>>> print(t.i+1,t.orth_,t.lemma_,t.pos_,t.tag_,t.head.i+1,t.dep_,t.whitespace_,t.norm_) +1 不 不 ADV v,副詞,否定,無界 2 advmod not +``` + +## Installation for Linux + +Tar-ball is available for Linux, and is installed by default when you use `pip`: +```sh +pip install udkanbun +``` + +## Installation for Cygwin + +Make sure to get `gcc-g++` `python37-pip` `python37-devel` packages, and then: +```sh +pip3.7 install udkanbun +``` +Use `python3.7` command in [Cygwin](https://www.cygwin.com/install.html) instead of `python`. + +## Installation for Jupyter Notebook (Google Colaboratory) + +```py +!pip install udkanbun +``` + +Try [notebook](https://colab.research.google.com/github/KoichiYasuoka/UD-Kanbun/blob/master/udkanbun.ipynb) for Google Colaboratory. + +## Author + +Koichi Yasuoka (安岡孝一) + +## References + +* Koichi Yasuoka: [Universal Dependencies Treebank of the Four Books in Classical Chinese](http://hdl.handle.net/2433/245217), DADH2019: 10th International Conference of Digital Archives and Digital Humanities (December 2019), pp.20-28. +* 安岡孝一: [四書を学んだMeCab+UDPipeはセンター試験の漢文を読めるのか](http://hdl.handle.net/2433/237383), 東洋学へのコンピュータ利用, 第30回研究セミナー (2019年3月8日), pp.3-110. +* 安岡孝一: [漢文の依存文法解析と返り点の関係について](http://hdl.handle.net/2433/235609), 日本漢字学会第1回研究大会予稿集 (2018年12月1日), pp.33-48. + +%prep +%autosetup -n udkanbun-3.3.7 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-udkanbun -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 3.3.7-1 +- Package Spec generated @@ -0,0 +1 @@ +7ce98285f2c9e1194c5ba1e80d0dcb62 udkanbun-3.3.7.tar.gz |