diff options
author | CoprDistGit <infra@openeuler.org> | 2023-05-31 03:31:59 +0000 |
---|---|---|
committer | CoprDistGit <infra@openeuler.org> | 2023-05-31 03:31:59 +0000 |
commit | e1dd1ad6b8b3027995f41f104303e272a0ad9acd (patch) | |
tree | 303b16f0f5fbc556d46dc1786f6990a28fbe6b4c /python-nlpyport.spec | |
parent | 5374ea84f6be3b11bd40905f93e0074e5f5e35f2 (diff) |
automatic import of python-nlpyport
Diffstat (limited to 'python-nlpyport.spec')
-rw-r--r-- | python-nlpyport.spec | 408 |
1 files changed, 408 insertions, 0 deletions
diff --git a/python-nlpyport.spec b/python-nlpyport.spec new file mode 100644 index 0000000..fb1ee91 --- /dev/null +++ b/python-nlpyport.spec @@ -0,0 +1,408 @@ +%global _empty_manifest_terminate_build 0 +Name: python-NLPyPort +Version: 2.2.5 +Release: 1 +Summary: Python NLP for Portuguese +License: cc0-1.0 +URL: https://github.com/jdportugal/NLPyPort +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/b7/cb/27c653a479f649313c3d6da4dd88bbf6e92ed9f71f6cf64aa8cc177426c4/NLPyPort-2.2.5.tar.gz +BuildArch: noarch + + +%description +# NLPyPort + + +The NLPy_Port is a pipeline assembled from the NLTK pipeline, adding and changing its elements for better processing the portuguese that were previouslly created for the NLPPort pipeline. +It suports at the moment the taks of Tokenization, PoS Tagging , Lemmatization and Named Entity Recognition + + +# Instalation +Installing NLPyPort should be as simple as installing the requirements or installing the module via pip (pip install NLPyPort). However, some other configurations may be necessary. + +If your NLTK version is above 3.4.5, install the version 3.4.5 by running: +```bash +>>> pip install nltk==3.4.5 +``` + +If you installed NLTK and do not have downloaded the "Floresta" corpus, run the following commands: +```bash +>>> import nltk +>>> nltk.download('floresta') +``` + + + +# Usage + +In order to simplify the usage of the NLPyPort pipeline, some structural changes were made. The “exemplo.py” file shows exemples os several use cases. + +## How to use the pipeline + +Depending on the planed usage, the pipeline may be called in three different ways: + +### 1 - Default +```python +text = new_full_pipe( your_input_file ) +``` + + +### 2 - Optional arguments +```python +text = new_full_pipe( your_input_file , options = options ) +``` + + +### 3 - Optional arguments and pre-load pipeline +```python +config_list = load_congif_to_list() # Pre-load the pipeline +text=new_full_pipe( your_input_file , options = options , config_list = config_list) +``` + + +## Available options + +"tokenizer" : True -> Perform Tokenization + +"pos_tagger" : True -> Perform Pos Tagging + +"lemmatizer" : True -> Perform Lemmatization + +"entity_recognition" : True -> Perform NER + +"np_chunking" : True -> Perform NP Chunking + +"pre_load" : False -> Preload the pipeline, needs the additional argument “config_list” + +"string_or_array" : True -> Set input as being an array or a string + + +## Returned text + +In case of success, the pipeline will return an object of the “Text” class. The properties of this are as follow: + text.tokens + text.pos_tags + text.lemas + text.entities + text.np_tags + +Additionally, there is a method to return the pipeline in the CoNNL Format: + text.print_conll() + +To separate lines , at the end of each line the additional token EOS is added. + + +# Credits + + +Tokenizer and Lemmatizer resource files - Rodrigues, Ricardo, Hugo Gonçalo Oliveira, and Paulo Gomes. "NLPPort: A Pipeline for Portuguese NLP (Short Paper)." 7th Symposium on Languages, Applications and Technologies (SLATE 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. + +Lemmatizer design - Rodrigues, Ricardo, Hugo Gonçalo Oliveira, and Paulo Gomes. "LemPORT: a high-accuracy cross-platform lemmatizer for portuguese." 3rd Symposium on Languages, Applications and Technologies. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2014. + +PoS trainer (adapted from) - https://github.com/fmaruki/Nltk-Tagger-Portuguese + +Named Entity Recognition + CRF suite - Naoaki Okazaki http://www.chokkan.org/software/crfsuite/ + sklearn-crfsuite wrapper - https://github.com/TeamHG-Memex/sklearn-crfsuite + +Corpus +Corpus for PoS tagging training + MacMorpho - http://nilc.icmc.usp.br/macmorpho/ + Floresta Sintá(c)tica - https://www.linguateca.pt/Floresta/corpus.html + + + +# Citations + +To cite and give credits to the pipeline please use the following BibText reference: + +@inproceedings{ferreira_etal:slate2019, + Author = {João Ferreira and Hugo {Gonçalo~Oliveira} and Ricardo Rodrigues}, + Booktitle = {Symposium on Languages, Applications and Technologies (SLATE 2019)}, + Month = {June}, + Note = {In press}, + Title = {Improving {NLTK} for Processing {P}ortuguese}, + Year = {2019}} + +%package -n python3-NLPyPort +Summary: Python NLP for Portuguese +Provides: python-NLPyPort +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-NLPyPort +# NLPyPort + + +The NLPy_Port is a pipeline assembled from the NLTK pipeline, adding and changing its elements for better processing the portuguese that were previouslly created for the NLPPort pipeline. +It suports at the moment the taks of Tokenization, PoS Tagging , Lemmatization and Named Entity Recognition + + +# Instalation +Installing NLPyPort should be as simple as installing the requirements or installing the module via pip (pip install NLPyPort). However, some other configurations may be necessary. + +If your NLTK version is above 3.4.5, install the version 3.4.5 by running: +```bash +>>> pip install nltk==3.4.5 +``` + +If you installed NLTK and do not have downloaded the "Floresta" corpus, run the following commands: +```bash +>>> import nltk +>>> nltk.download('floresta') +``` + + + +# Usage + +In order to simplify the usage of the NLPyPort pipeline, some structural changes were made. The “exemplo.py” file shows exemples os several use cases. + +## How to use the pipeline + +Depending on the planed usage, the pipeline may be called in three different ways: + +### 1 - Default +```python +text = new_full_pipe( your_input_file ) +``` + + +### 2 - Optional arguments +```python +text = new_full_pipe( your_input_file , options = options ) +``` + + +### 3 - Optional arguments and pre-load pipeline +```python +config_list = load_congif_to_list() # Pre-load the pipeline +text=new_full_pipe( your_input_file , options = options , config_list = config_list) +``` + + +## Available options + +"tokenizer" : True -> Perform Tokenization + +"pos_tagger" : True -> Perform Pos Tagging + +"lemmatizer" : True -> Perform Lemmatization + +"entity_recognition" : True -> Perform NER + +"np_chunking" : True -> Perform NP Chunking + +"pre_load" : False -> Preload the pipeline, needs the additional argument “config_list” + +"string_or_array" : True -> Set input as being an array or a string + + +## Returned text + +In case of success, the pipeline will return an object of the “Text” class. The properties of this are as follow: + text.tokens + text.pos_tags + text.lemas + text.entities + text.np_tags + +Additionally, there is a method to return the pipeline in the CoNNL Format: + text.print_conll() + +To separate lines , at the end of each line the additional token EOS is added. + + +# Credits + + +Tokenizer and Lemmatizer resource files - Rodrigues, Ricardo, Hugo Gonçalo Oliveira, and Paulo Gomes. "NLPPort: A Pipeline for Portuguese NLP (Short Paper)." 7th Symposium on Languages, Applications and Technologies (SLATE 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. + +Lemmatizer design - Rodrigues, Ricardo, Hugo Gonçalo Oliveira, and Paulo Gomes. "LemPORT: a high-accuracy cross-platform lemmatizer for portuguese." 3rd Symposium on Languages, Applications and Technologies. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2014. + +PoS trainer (adapted from) - https://github.com/fmaruki/Nltk-Tagger-Portuguese + +Named Entity Recognition + CRF suite - Naoaki Okazaki http://www.chokkan.org/software/crfsuite/ + sklearn-crfsuite wrapper - https://github.com/TeamHG-Memex/sklearn-crfsuite + +Corpus +Corpus for PoS tagging training + MacMorpho - http://nilc.icmc.usp.br/macmorpho/ + Floresta Sintá(c)tica - https://www.linguateca.pt/Floresta/corpus.html + + + +# Citations + +To cite and give credits to the pipeline please use the following BibText reference: + +@inproceedings{ferreira_etal:slate2019, + Author = {João Ferreira and Hugo {Gonçalo~Oliveira} and Ricardo Rodrigues}, + Booktitle = {Symposium on Languages, Applications and Technologies (SLATE 2019)}, + Month = {June}, + Note = {In press}, + Title = {Improving {NLTK} for Processing {P}ortuguese}, + Year = {2019}} + +%package help +Summary: Development documents and examples for NLPyPort +Provides: python3-NLPyPort-doc +%description help +# NLPyPort + + +The NLPy_Port is a pipeline assembled from the NLTK pipeline, adding and changing its elements for better processing the portuguese that were previouslly created for the NLPPort pipeline. +It suports at the moment the taks of Tokenization, PoS Tagging , Lemmatization and Named Entity Recognition + + +# Instalation +Installing NLPyPort should be as simple as installing the requirements or installing the module via pip (pip install NLPyPort). However, some other configurations may be necessary. + +If your NLTK version is above 3.4.5, install the version 3.4.5 by running: +```bash +>>> pip install nltk==3.4.5 +``` + +If you installed NLTK and do not have downloaded the "Floresta" corpus, run the following commands: +```bash +>>> import nltk +>>> nltk.download('floresta') +``` + + + +# Usage + +In order to simplify the usage of the NLPyPort pipeline, some structural changes were made. The “exemplo.py” file shows exemples os several use cases. + +## How to use the pipeline + +Depending on the planed usage, the pipeline may be called in three different ways: + +### 1 - Default +```python +text = new_full_pipe( your_input_file ) +``` + + +### 2 - Optional arguments +```python +text = new_full_pipe( your_input_file , options = options ) +``` + + +### 3 - Optional arguments and pre-load pipeline +```python +config_list = load_congif_to_list() # Pre-load the pipeline +text=new_full_pipe( your_input_file , options = options , config_list = config_list) +``` + + +## Available options + +"tokenizer" : True -> Perform Tokenization + +"pos_tagger" : True -> Perform Pos Tagging + +"lemmatizer" : True -> Perform Lemmatization + +"entity_recognition" : True -> Perform NER + +"np_chunking" : True -> Perform NP Chunking + +"pre_load" : False -> Preload the pipeline, needs the additional argument “config_list” + +"string_or_array" : True -> Set input as being an array or a string + + +## Returned text + +In case of success, the pipeline will return an object of the “Text” class. The properties of this are as follow: + text.tokens + text.pos_tags + text.lemas + text.entities + text.np_tags + +Additionally, there is a method to return the pipeline in the CoNNL Format: + text.print_conll() + +To separate lines , at the end of each line the additional token EOS is added. + + +# Credits + + +Tokenizer and Lemmatizer resource files - Rodrigues, Ricardo, Hugo Gonçalo Oliveira, and Paulo Gomes. "NLPPort: A Pipeline for Portuguese NLP (Short Paper)." 7th Symposium on Languages, Applications and Technologies (SLATE 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. + +Lemmatizer design - Rodrigues, Ricardo, Hugo Gonçalo Oliveira, and Paulo Gomes. "LemPORT: a high-accuracy cross-platform lemmatizer for portuguese." 3rd Symposium on Languages, Applications and Technologies. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2014. + +PoS trainer (adapted from) - https://github.com/fmaruki/Nltk-Tagger-Portuguese + +Named Entity Recognition + CRF suite - Naoaki Okazaki http://www.chokkan.org/software/crfsuite/ + sklearn-crfsuite wrapper - https://github.com/TeamHG-Memex/sklearn-crfsuite + +Corpus +Corpus for PoS tagging training + MacMorpho - http://nilc.icmc.usp.br/macmorpho/ + Floresta Sintá(c)tica - https://www.linguateca.pt/Floresta/corpus.html + + + +# Citations + +To cite and give credits to the pipeline please use the following BibText reference: + +@inproceedings{ferreira_etal:slate2019, + Author = {João Ferreira and Hugo {Gonçalo~Oliveira} and Ricardo Rodrigues}, + Booktitle = {Symposium on Languages, Applications and Technologies (SLATE 2019)}, + Month = {June}, + Note = {In press}, + Title = {Improving {NLTK} for Processing {P}ortuguese}, + Year = {2019}} + +%prep +%autosetup -n NLPyPort-2.2.5 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-NLPyPort -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Wed May 31 2023 Python_Bot <Python_Bot@openeuler.org> - 2.2.5-1 +- Package Spec generated |