summaryrefslogtreecommitdiff
path: root/python-nlpyport.spec
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-05-31 03:31:59 +0000
committerCoprDistGit <infra@openeuler.org>2023-05-31 03:31:59 +0000
commite1dd1ad6b8b3027995f41f104303e272a0ad9acd (patch)
tree303b16f0f5fbc556d46dc1786f6990a28fbe6b4c /python-nlpyport.spec
parent5374ea84f6be3b11bd40905f93e0074e5f5e35f2 (diff)
automatic import of python-nlpyport
Diffstat (limited to 'python-nlpyport.spec')
-rw-r--r--python-nlpyport.spec408
1 files changed, 408 insertions, 0 deletions
diff --git a/python-nlpyport.spec b/python-nlpyport.spec
new file mode 100644
index 0000000..fb1ee91
--- /dev/null
+++ b/python-nlpyport.spec
@@ -0,0 +1,408 @@
+%global _empty_manifest_terminate_build 0
+Name: python-NLPyPort
+Version: 2.2.5
+Release: 1
+Summary: Python NLP for Portuguese
+License: cc0-1.0
+URL: https://github.com/jdportugal/NLPyPort
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/b7/cb/27c653a479f649313c3d6da4dd88bbf6e92ed9f71f6cf64aa8cc177426c4/NLPyPort-2.2.5.tar.gz
+BuildArch: noarch
+
+
+%description
+# NLPyPort
+
+
+The NLPy_Port is a pipeline assembled from the NLTK pipeline, adding and changing its elements for better processing the portuguese that were previouslly created for the NLPPort pipeline.
+It suports at the moment the taks of Tokenization, PoS Tagging , Lemmatization and Named Entity Recognition
+
+
+# Instalation
+Installing NLPyPort should be as simple as installing the requirements or installing the module via pip (pip install NLPyPort). However, some other configurations may be necessary.
+
+If your NLTK version is above 3.4.5, install the version 3.4.5 by running:
+```bash
+>>> pip install nltk==3.4.5
+```
+
+If you installed NLTK and do not have downloaded the "Floresta" corpus, run the following commands:
+```bash
+>>> import nltk
+>>> nltk.download('floresta')
+```
+
+
+
+# Usage
+
+In order to simplify the usage of the NLPyPort pipeline, some structural changes were made. The “exemplo.py” file shows exemples os several use cases.
+
+## How to use the pipeline
+
+Depending on the planed usage, the pipeline may be called in three different ways:
+
+### 1 - Default
+```python
+text = new_full_pipe( your_input_file )
+```
+
+
+### 2 - Optional arguments
+```python
+text = new_full_pipe( your_input_file , options = options )
+```
+
+
+### 3 - Optional arguments and pre-load pipeline
+```python
+config_list = load_congif_to_list() # Pre-load the pipeline
+text=new_full_pipe( your_input_file , options = options , config_list = config_list)
+```
+
+
+## Available options
+
+"tokenizer" : True -> Perform Tokenization
+
+"pos_tagger" : True -> Perform Pos Tagging
+
+"lemmatizer" : True -> Perform Lemmatization
+
+"entity_recognition" : True -> Perform NER
+
+"np_chunking" : True -> Perform NP Chunking
+
+"pre_load" : False -> Preload the pipeline, needs the additional argument “config_list”
+
+"string_or_array" : True -> Set input as being an array or a string
+
+
+## Returned text
+
+In case of success, the pipeline will return an object of the “Text” class. The properties of this are as follow:
+ text.tokens
+ text.pos_tags
+ text.lemas
+ text.entities
+ text.np_tags
+
+Additionally, there is a method to return the pipeline in the CoNNL Format:
+ text.print_conll()
+
+To separate lines , at the end of each line the additional token EOS is added.
+
+
+# Credits
+
+
+Tokenizer and Lemmatizer resource files - Rodrigues, Ricardo, Hugo Gonçalo Oliveira, and Paulo Gomes. "NLPPort: A Pipeline for Portuguese NLP (Short Paper)." 7th Symposium on Languages, Applications and Technologies (SLATE 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.
+
+Lemmatizer design - Rodrigues, Ricardo, Hugo Gonçalo Oliveira, and Paulo Gomes. "LemPORT: a high-accuracy cross-platform lemmatizer for portuguese." 3rd Symposium on Languages, Applications and Technologies. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2014.
+
+PoS trainer (adapted from) - https://github.com/fmaruki/Nltk-Tagger-Portuguese
+
+Named Entity Recognition
+ CRF suite - Naoaki Okazaki http://www.chokkan.org/software/crfsuite/
+ sklearn-crfsuite wrapper - https://github.com/TeamHG-Memex/sklearn-crfsuite
+
+Corpus
+Corpus for PoS tagging training
+ MacMorpho - http://nilc.icmc.usp.br/macmorpho/
+ Floresta Sintá(c)tica - https://www.linguateca.pt/Floresta/corpus.html
+
+
+
+# Citations
+
+To cite and give credits to the pipeline please use the following BibText reference:
+
+@inproceedings{ferreira_etal:slate2019,
+ Author = {João Ferreira and Hugo {Gonçalo~Oliveira} and Ricardo Rodrigues},
+ Booktitle = {Symposium on Languages, Applications and Technologies (SLATE 2019)},
+ Month = {June},
+ Note = {In press},
+ Title = {Improving {NLTK} for Processing {P}ortuguese},
+ Year = {2019}}
+
+%package -n python3-NLPyPort
+Summary: Python NLP for Portuguese
+Provides: python-NLPyPort
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-NLPyPort
+# NLPyPort
+
+
+The NLPy_Port is a pipeline assembled from the NLTK pipeline, adding and changing its elements for better processing the portuguese that were previouslly created for the NLPPort pipeline.
+It suports at the moment the taks of Tokenization, PoS Tagging , Lemmatization and Named Entity Recognition
+
+
+# Instalation
+Installing NLPyPort should be as simple as installing the requirements or installing the module via pip (pip install NLPyPort). However, some other configurations may be necessary.
+
+If your NLTK version is above 3.4.5, install the version 3.4.5 by running:
+```bash
+>>> pip install nltk==3.4.5
+```
+
+If you installed NLTK and do not have downloaded the "Floresta" corpus, run the following commands:
+```bash
+>>> import nltk
+>>> nltk.download('floresta')
+```
+
+
+
+# Usage
+
+In order to simplify the usage of the NLPyPort pipeline, some structural changes were made. The “exemplo.py” file shows exemples os several use cases.
+
+## How to use the pipeline
+
+Depending on the planed usage, the pipeline may be called in three different ways:
+
+### 1 - Default
+```python
+text = new_full_pipe( your_input_file )
+```
+
+
+### 2 - Optional arguments
+```python
+text = new_full_pipe( your_input_file , options = options )
+```
+
+
+### 3 - Optional arguments and pre-load pipeline
+```python
+config_list = load_congif_to_list() # Pre-load the pipeline
+text=new_full_pipe( your_input_file , options = options , config_list = config_list)
+```
+
+
+## Available options
+
+"tokenizer" : True -> Perform Tokenization
+
+"pos_tagger" : True -> Perform Pos Tagging
+
+"lemmatizer" : True -> Perform Lemmatization
+
+"entity_recognition" : True -> Perform NER
+
+"np_chunking" : True -> Perform NP Chunking
+
+"pre_load" : False -> Preload the pipeline, needs the additional argument “config_list”
+
+"string_or_array" : True -> Set input as being an array or a string
+
+
+## Returned text
+
+In case of success, the pipeline will return an object of the “Text” class. The properties of this are as follow:
+ text.tokens
+ text.pos_tags
+ text.lemas
+ text.entities
+ text.np_tags
+
+Additionally, there is a method to return the pipeline in the CoNNL Format:
+ text.print_conll()
+
+To separate lines , at the end of each line the additional token EOS is added.
+
+
+# Credits
+
+
+Tokenizer and Lemmatizer resource files - Rodrigues, Ricardo, Hugo Gonçalo Oliveira, and Paulo Gomes. "NLPPort: A Pipeline for Portuguese NLP (Short Paper)." 7th Symposium on Languages, Applications and Technologies (SLATE 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.
+
+Lemmatizer design - Rodrigues, Ricardo, Hugo Gonçalo Oliveira, and Paulo Gomes. "LemPORT: a high-accuracy cross-platform lemmatizer for portuguese." 3rd Symposium on Languages, Applications and Technologies. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2014.
+
+PoS trainer (adapted from) - https://github.com/fmaruki/Nltk-Tagger-Portuguese
+
+Named Entity Recognition
+ CRF suite - Naoaki Okazaki http://www.chokkan.org/software/crfsuite/
+ sklearn-crfsuite wrapper - https://github.com/TeamHG-Memex/sklearn-crfsuite
+
+Corpus
+Corpus for PoS tagging training
+ MacMorpho - http://nilc.icmc.usp.br/macmorpho/
+ Floresta Sintá(c)tica - https://www.linguateca.pt/Floresta/corpus.html
+
+
+
+# Citations
+
+To cite and give credits to the pipeline please use the following BibText reference:
+
+@inproceedings{ferreira_etal:slate2019,
+ Author = {João Ferreira and Hugo {Gonçalo~Oliveira} and Ricardo Rodrigues},
+ Booktitle = {Symposium on Languages, Applications and Technologies (SLATE 2019)},
+ Month = {June},
+ Note = {In press},
+ Title = {Improving {NLTK} for Processing {P}ortuguese},
+ Year = {2019}}
+
+%package help
+Summary: Development documents and examples for NLPyPort
+Provides: python3-NLPyPort-doc
+%description help
+# NLPyPort
+
+
+The NLPy_Port is a pipeline assembled from the NLTK pipeline, adding and changing its elements for better processing the portuguese that were previouslly created for the NLPPort pipeline.
+It suports at the moment the taks of Tokenization, PoS Tagging , Lemmatization and Named Entity Recognition
+
+
+# Instalation
+Installing NLPyPort should be as simple as installing the requirements or installing the module via pip (pip install NLPyPort). However, some other configurations may be necessary.
+
+If your NLTK version is above 3.4.5, install the version 3.4.5 by running:
+```bash
+>>> pip install nltk==3.4.5
+```
+
+If you installed NLTK and do not have downloaded the "Floresta" corpus, run the following commands:
+```bash
+>>> import nltk
+>>> nltk.download('floresta')
+```
+
+
+
+# Usage
+
+In order to simplify the usage of the NLPyPort pipeline, some structural changes were made. The “exemplo.py” file shows exemples os several use cases.
+
+## How to use the pipeline
+
+Depending on the planed usage, the pipeline may be called in three different ways:
+
+### 1 - Default
+```python
+text = new_full_pipe( your_input_file )
+```
+
+
+### 2 - Optional arguments
+```python
+text = new_full_pipe( your_input_file , options = options )
+```
+
+
+### 3 - Optional arguments and pre-load pipeline
+```python
+config_list = load_congif_to_list() # Pre-load the pipeline
+text=new_full_pipe( your_input_file , options = options , config_list = config_list)
+```
+
+
+## Available options
+
+"tokenizer" : True -> Perform Tokenization
+
+"pos_tagger" : True -> Perform Pos Tagging
+
+"lemmatizer" : True -> Perform Lemmatization
+
+"entity_recognition" : True -> Perform NER
+
+"np_chunking" : True -> Perform NP Chunking
+
+"pre_load" : False -> Preload the pipeline, needs the additional argument “config_list”
+
+"string_or_array" : True -> Set input as being an array or a string
+
+
+## Returned text
+
+In case of success, the pipeline will return an object of the “Text” class. The properties of this are as follow:
+ text.tokens
+ text.pos_tags
+ text.lemas
+ text.entities
+ text.np_tags
+
+Additionally, there is a method to return the pipeline in the CoNNL Format:
+ text.print_conll()
+
+To separate lines , at the end of each line the additional token EOS is added.
+
+
+# Credits
+
+
+Tokenizer and Lemmatizer resource files - Rodrigues, Ricardo, Hugo Gonçalo Oliveira, and Paulo Gomes. "NLPPort: A Pipeline for Portuguese NLP (Short Paper)." 7th Symposium on Languages, Applications and Technologies (SLATE 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.
+
+Lemmatizer design - Rodrigues, Ricardo, Hugo Gonçalo Oliveira, and Paulo Gomes. "LemPORT: a high-accuracy cross-platform lemmatizer for portuguese." 3rd Symposium on Languages, Applications and Technologies. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2014.
+
+PoS trainer (adapted from) - https://github.com/fmaruki/Nltk-Tagger-Portuguese
+
+Named Entity Recognition
+ CRF suite - Naoaki Okazaki http://www.chokkan.org/software/crfsuite/
+ sklearn-crfsuite wrapper - https://github.com/TeamHG-Memex/sklearn-crfsuite
+
+Corpus
+Corpus for PoS tagging training
+ MacMorpho - http://nilc.icmc.usp.br/macmorpho/
+ Floresta Sintá(c)tica - https://www.linguateca.pt/Floresta/corpus.html
+
+
+
+# Citations
+
+To cite and give credits to the pipeline please use the following BibText reference:
+
+@inproceedings{ferreira_etal:slate2019,
+ Author = {João Ferreira and Hugo {Gonçalo~Oliveira} and Ricardo Rodrigues},
+ Booktitle = {Symposium on Languages, Applications and Technologies (SLATE 2019)},
+ Month = {June},
+ Note = {In press},
+ Title = {Improving {NLTK} for Processing {P}ortuguese},
+ Year = {2019}}
+
+%prep
+%autosetup -n NLPyPort-2.2.5
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-NLPyPort -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Wed May 31 2023 Python_Bot <Python_Bot@openeuler.org> - 2.2.5-1
+- Package Spec generated