summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-05-05 06:14:57 +0000
committerCoprDistGit <infra@openeuler.org>2023-05-05 06:14:57 +0000
commit508a62a8e4af6c9d4ab2b5da6707fcf306dc9e5f (patch)
tree9677b2d38f96dfdada8418386de4f26bc65074d5
parent8399352891331461018cd1147d6be22aa8233797 (diff)
automatic import of python-nerdaopeneuler20.03
-rw-r--r--.gitignore1
-rw-r--r--python-nerda.spec739
-rw-r--r--sources1
3 files changed, 741 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..9c70d8b 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/NERDA-1.0.0.tar.gz
diff --git a/python-nerda.spec b/python-nerda.spec
new file mode 100644
index 0000000..1f5a084
--- /dev/null
+++ b/python-nerda.spec
@@ -0,0 +1,739 @@
+%global _empty_manifest_terminate_build 0
+Name: python-NERDA
+Version: 1.0.0
+Release: 1
+Summary: A Framework for Finetuning Transformers for Named-Entity Recognition
+License: MIT License
+URL: https://github.com/ebanalyse/NERDA
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/4e/80/3f7ae5e94a16f0dace64996b7eab7ee437b303872654d6705a13654bd132/NERDA-1.0.0.tar.gz
+BuildArch: noarch
+
+Requires: python3-torch
+Requires: python3-transformers
+Requires: python3-sklearn
+Requires: python3-nltk
+Requires: python3-pandas
+Requires: python3-progressbar
+Requires: python3-pyconll
+
+%description
+# NERDA <img src="https://raw.githubusercontent.com/ebanalyse/NERDA/main/logo.png" align="right" height=250/>
+
+![Build status](https://github.com/ebanalyse/NERDA/workflows/build/badge.svg)
+[![codecov](https://codecov.io/gh/ebanalyse/NERDA/branch/main/graph/badge.svg?token=OB6LGFQZYX)](https://codecov.io/gh/ebanalyse/NERDA)
+![PyPI](https://img.shields.io/pypi/v/NERDA.svg)
+![PyPI - Downloads](https://img.shields.io/pypi/dm/NERDA?color=green)
+![License](https://img.shields.io/badge/license-MIT-blue.svg)
+
+Not only is `NERDA` a mesmerizing muppet-like character. `NERDA` is also
+a python package, that offers a slick easy-to-use interface for fine-tuning
+pretrained transformers for Named Entity Recognition
+ (=NER) tasks.
+
+You can also utilize `NERDA` to access a selection of *precooked* `NERDA` models,
+ that you can use right off the shelf for NER tasks.
+
+`NERDA` is built on `huggingface` `transformers` and the popular `pytorch`
+ framework.
+
+## Installation guide
+`NERDA` can be installed from [PyPI](https://pypi.org/project/NERDA/) with
+
+```
+pip install NERDA
+```
+
+If you want the development version then install directly from [GitHub](https://github.com/ebanalyse/NERDA).
+
+## Named-Entity Recogntion tasks
+Named-entity recognition (NER) (also known as (named) entity identification,
+entity chunking, and entity extraction) is a subtask of information extraction
+that seeks to locate and classify named entities mentioned in unstructured
+text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.<sup>[1]</sup>
+
+[1]: https://en.wikipedia.org/wiki/Named-entity_recognition
+
+### Example Task:
+
+**Task**
+
+Identify person names and organizations in text:
+
+*Jim bought 300 shares of Acme Corp.*
+
+**Solution**
+
+| **Named Entity** | **Type** |
+|--------------------|-----------------------|
+| 'Jim' | Person |
+| 'Acme Corp.' | Organization |
+
+Read more about NER on [Wikipedia](https://en.wikipedia.org/wiki/Named-entity_recognition).
+
+## Train Your Own `NERDA` Model
+
+Say, we want to fine-tune a pretrained [Multilingual BERT](https://huggingface.co/bert-base-multilingual-uncased) transformer for NER in English.
+
+Load package.
+
+```python
+from NERDA.models import NERDA
+```
+
+Instantiate a `NERDA` model (*with default settings*) for the
+[`CoNLL-2003`](https://www.clips.uantwerpen.be/conll2003/ner/)
+English NER data set.
+
+```python
+from NERDA.datasets import get_conll_data
+model = NERDA(dataset_training = get_conll_data('train'),
+ dataset_validation = get_conll_data('valid'),
+ transformer = 'bert-base-multilingual-uncased')
+```
+
+By default the network architecture is analogous to that of the models in [Hvingelby et al. 2020](http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.565.pdf).
+
+The model can then be trained/fine-tuned by invoking the `train` method, e.g.
+
+```python
+model.train()
+```
+
+**Note**: this will take some time depending on the dimensions of your machine
+(if you want to skip training, you can go ahead and use one of the models,
+that we have already precooked for you in stead).
+
+After the model has been trained, the model can be used for predicting
+named entities in new texts.
+
+```python
+# text to identify named entities in.
+text = 'Old MacDonald had a farm'
+model.predict_text(text)
+([['Old', 'MacDonald', 'had', 'a', 'farm']], [['B-PER', 'I-PER', 'O', 'O', 'O']])
+```
+This means, that the model identified 'Old MacDonald' as a *PER*son.
+
+Please note, that the `NERDA` model configuration above was instantiated
+with all default settings. You can however customize your `NERDA` model
+in a lot of ways:
+
+- Use your own data set (finetune a transformer for any given language)
+- Choose whatever transformer you like
+- Set all of the hyperparameters for the model
+- You can even apply your own Network Architecture
+
+Read more about advanced usage of `NERDA` in the [detailed documentation](https://ebanalyse.github.io/NERDA/workflow).
+
+## Use a Precooked `NERDA` model
+
+We have precooked a number of `NERDA` models for Danish and English, that you can download
+and use right off the shelf.
+
+Here is an example.
+
+Instantiate a multilingual BERT model, that has been finetuned for NER in Danish,
+`DA_BERT_ML`.
+
+```python
+from NERDA.precooked import DA_BERT_ML
+model = DA_BERT_ML()
+```
+
+Down(load) network from web:
+
+```python
+model.download_network()
+model.load_network()
+```
+
+You can now predict named entities in new (Danish) texts
+
+```python
+# (Danish) text to identify named entities in:
+# 'Jens Hansen har en bondegård' = 'Old MacDonald had a farm'
+text = 'Jens Hansen har en bondegård'
+model.predict_text(text)
+([['Jens', 'Hansen', 'har', 'en', 'bondegård']], [['B-PER', 'I-PER', 'O', 'O', 'O']])
+```
+
+### List of Precooked Models
+
+The table below shows the precooked `NERDA` models publicly available for download.
+
+| **Model** | **Language** | **Transformer** | **Dataset** | **F1-score** |
+|-----------------|--------------|-------------------|---------|-----|
+| `DA_BERT_ML` | Danish | [Multilingual BERT](https://huggingface.co/bert-base-multilingual-uncased) | [DaNE](https://github.com/alexandrainst/danlp/blob/master/docs/docs/datasets.md#dane) | 82.8 |
+`DA_ELECTRA_DA` | Danish | [Danish ELECTRA](https://huggingface.co/Maltehb/-l-ctra-danish-electra-small-uncased) | [DaNE](https://github.com/alexandrainst/danlp/blob/master/docs/docs/datasets.md#dane) | 79.8 |
+| `EN_BERT_ML` | English | [Multilingual BERT](https://huggingface.co/bert-base-multilingual-uncased)| [CoNLL-2003](https://www.clips.uantwerpen.be/conll2003/ner/) | 90.4 |
+| `EN_ELECTRA_EN` | English | [English ELECTRA](https://huggingface.co/google/electra-small-discriminator) | [CoNLL-2003](https://www.clips.uantwerpen.be/conll2003/ner/) | 89.1 |
+
+**F1-score** is the micro-averaged F1-score across entity tags and is
+evaluated on the respective test sets (that have not been used for training nor
+validation of the models).
+
+Note, that we have not spent a lot of time on actually fine-tuning the models,
+so there could be room for improvement. If you are able to improve the models,
+we will be happy to hear from you and include your `NERDA` model.
+
+### Model Performance
+
+The table below summarizes the performance (F1-scores) of the precooked `NERDA` models.
+
+| **Level** | `DA_BERT_ML` | `DA_ELECTRA_DA` | `EN_BERT_ML` | `EN_ELECTRA_EN` |
+|---------------|--------------|-----------------|--------------|-----------------|
+| B-PER | 93.8 | 92.0 | 96.0 | 95.1 |
+| I-PER | 97.8 | 97.1 | 98.5 | 97.9 |
+| B-ORG | 69.5 | 66.9 | 88.4 | 86.2 |
+| I-ORG | 69.9 | 70.7 | 85.7 | 83.1 |
+| B-LOC | 82.5 | 79.0 | 92.3 | 91.1 |
+| I-LOC | 31.6 | 44.4 | 83.9 | 80.5 |
+| B-MISC | 73.4 | 68.6 | 81.8 | 80.1 |
+| I-MISC | 86.1 | 63.6 | 63.4 | 68.4 |
+| **AVG_MICRO** | 82.8 | 79.8 | 90.4 | 89.1 |
+| **AVG_MACRO** | 75.6 | 72.8 | 86.3 | 85.3 |
+
+## 'NERDA'?
+'`NERDA`' originally stands for *'Named Entity Recognition for DAnish'*. However, this
+is somewhat misleading, since the functionality is no longer limited to Danish.
+On the contrary it generalizes to all other languages, i.e. `NERDA` supports
+fine-tuning of transformers for NER tasks for any arbitrary
+language.
+
+## Background
+`NERDA` is developed as a part of [Ekstra Bladet](https://ekstrabladet.dk/)’s activities on Platform Intelligence in News (PIN). PIN is an industrial research project that is carried out in collaboration between the [Technical University of Denmark](https://www.dtu.dk/), [University of Copenhagen](https://www.ku.dk/) and [Copenhagen Business School](https://www.cbs.dk/) with funding from [Innovation Fund Denmark](https://innovationsfonden.dk/). The project runs from 2020-2023 and develops recommender systems and natural language processing systems geared for news publishing, some of which are open sourced like `NERDA`.
+
+## Shout-outs
+- Thanks to [Alexandra Institute](https://alexandra.dk/) for with the [`danlp`](https://github.com/alexandrainst/danlp) package to have encouraged us to develop this package.
+- Thanks to [Malte Højmark-Bertelsen](https://www.linkedin.com/in/malte-h%C3%B8jmark-bertelsen-9a618017b/) and [Kasper Junge](https://www.linkedin.com/in/kasper-juunge/?originalSubdomain=dk) for giving feedback on `NERDA`.
+
+## Read more
+The detailed documentation for `NERDA` including code references and
+extended workflow examples can be accessed [here](https://ebanalyse.github.io/NERDA/).
+
+## Cite this work
+
+```
+@inproceedings{nerda,
+ title = {NERDA},
+ author = {Kjeldgaard, Lars and Nielsen, Lukas},
+ year = {2020},
+ publisher = {{GitHub}},
+ url = {https://github.com/ebanalyse/NERDA}
+}
+```
+
+## Contact
+We hope, that you will find `NERDA` useful.
+
+Please direct any questions and feedbacks to
+[us](mailto:lars.kjeldgaard@eb.dk)!
+
+If you want to contribute (which we encourage you to), open a
+[PR](https://github.com/ebanalyse/NERDA/pulls).
+
+If you encounter a bug or want to suggest an enhancement, please
+[open an issue](https://github.com/ebanalyse/NERDA/issues).
+
+
+
+
+
+%package -n python3-NERDA
+Summary: A Framework for Finetuning Transformers for Named-Entity Recognition
+Provides: python-NERDA
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-NERDA
+# NERDA <img src="https://raw.githubusercontent.com/ebanalyse/NERDA/main/logo.png" align="right" height=250/>
+
+![Build status](https://github.com/ebanalyse/NERDA/workflows/build/badge.svg)
+[![codecov](https://codecov.io/gh/ebanalyse/NERDA/branch/main/graph/badge.svg?token=OB6LGFQZYX)](https://codecov.io/gh/ebanalyse/NERDA)
+![PyPI](https://img.shields.io/pypi/v/NERDA.svg)
+![PyPI - Downloads](https://img.shields.io/pypi/dm/NERDA?color=green)
+![License](https://img.shields.io/badge/license-MIT-blue.svg)
+
+Not only is `NERDA` a mesmerizing muppet-like character. `NERDA` is also
+a python package, that offers a slick easy-to-use interface for fine-tuning
+pretrained transformers for Named Entity Recognition
+ (=NER) tasks.
+
+You can also utilize `NERDA` to access a selection of *precooked* `NERDA` models,
+ that you can use right off the shelf for NER tasks.
+
+`NERDA` is built on `huggingface` `transformers` and the popular `pytorch`
+ framework.
+
+## Installation guide
+`NERDA` can be installed from [PyPI](https://pypi.org/project/NERDA/) with
+
+```
+pip install NERDA
+```
+
+If you want the development version then install directly from [GitHub](https://github.com/ebanalyse/NERDA).
+
+## Named-Entity Recogntion tasks
+Named-entity recognition (NER) (also known as (named) entity identification,
+entity chunking, and entity extraction) is a subtask of information extraction
+that seeks to locate and classify named entities mentioned in unstructured
+text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.<sup>[1]</sup>
+
+[1]: https://en.wikipedia.org/wiki/Named-entity_recognition
+
+### Example Task:
+
+**Task**
+
+Identify person names and organizations in text:
+
+*Jim bought 300 shares of Acme Corp.*
+
+**Solution**
+
+| **Named Entity** | **Type** |
+|--------------------|-----------------------|
+| 'Jim' | Person |
+| 'Acme Corp.' | Organization |
+
+Read more about NER on [Wikipedia](https://en.wikipedia.org/wiki/Named-entity_recognition).
+
+## Train Your Own `NERDA` Model
+
+Say, we want to fine-tune a pretrained [Multilingual BERT](https://huggingface.co/bert-base-multilingual-uncased) transformer for NER in English.
+
+Load package.
+
+```python
+from NERDA.models import NERDA
+```
+
+Instantiate a `NERDA` model (*with default settings*) for the
+[`CoNLL-2003`](https://www.clips.uantwerpen.be/conll2003/ner/)
+English NER data set.
+
+```python
+from NERDA.datasets import get_conll_data
+model = NERDA(dataset_training = get_conll_data('train'),
+ dataset_validation = get_conll_data('valid'),
+ transformer = 'bert-base-multilingual-uncased')
+```
+
+By default the network architecture is analogous to that of the models in [Hvingelby et al. 2020](http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.565.pdf).
+
+The model can then be trained/fine-tuned by invoking the `train` method, e.g.
+
+```python
+model.train()
+```
+
+**Note**: this will take some time depending on the dimensions of your machine
+(if you want to skip training, you can go ahead and use one of the models,
+that we have already precooked for you in stead).
+
+After the model has been trained, the model can be used for predicting
+named entities in new texts.
+
+```python
+# text to identify named entities in.
+text = 'Old MacDonald had a farm'
+model.predict_text(text)
+([['Old', 'MacDonald', 'had', 'a', 'farm']], [['B-PER', 'I-PER', 'O', 'O', 'O']])
+```
+This means, that the model identified 'Old MacDonald' as a *PER*son.
+
+Please note, that the `NERDA` model configuration above was instantiated
+with all default settings. You can however customize your `NERDA` model
+in a lot of ways:
+
+- Use your own data set (finetune a transformer for any given language)
+- Choose whatever transformer you like
+- Set all of the hyperparameters for the model
+- You can even apply your own Network Architecture
+
+Read more about advanced usage of `NERDA` in the [detailed documentation](https://ebanalyse.github.io/NERDA/workflow).
+
+## Use a Precooked `NERDA` model
+
+We have precooked a number of `NERDA` models for Danish and English, that you can download
+and use right off the shelf.
+
+Here is an example.
+
+Instantiate a multilingual BERT model, that has been finetuned for NER in Danish,
+`DA_BERT_ML`.
+
+```python
+from NERDA.precooked import DA_BERT_ML
+model = DA_BERT_ML()
+```
+
+Down(load) network from web:
+
+```python
+model.download_network()
+model.load_network()
+```
+
+You can now predict named entities in new (Danish) texts
+
+```python
+# (Danish) text to identify named entities in:
+# 'Jens Hansen har en bondegård' = 'Old MacDonald had a farm'
+text = 'Jens Hansen har en bondegård'
+model.predict_text(text)
+([['Jens', 'Hansen', 'har', 'en', 'bondegård']], [['B-PER', 'I-PER', 'O', 'O', 'O']])
+```
+
+### List of Precooked Models
+
+The table below shows the precooked `NERDA` models publicly available for download.
+
+| **Model** | **Language** | **Transformer** | **Dataset** | **F1-score** |
+|-----------------|--------------|-------------------|---------|-----|
+| `DA_BERT_ML` | Danish | [Multilingual BERT](https://huggingface.co/bert-base-multilingual-uncased) | [DaNE](https://github.com/alexandrainst/danlp/blob/master/docs/docs/datasets.md#dane) | 82.8 |
+`DA_ELECTRA_DA` | Danish | [Danish ELECTRA](https://huggingface.co/Maltehb/-l-ctra-danish-electra-small-uncased) | [DaNE](https://github.com/alexandrainst/danlp/blob/master/docs/docs/datasets.md#dane) | 79.8 |
+| `EN_BERT_ML` | English | [Multilingual BERT](https://huggingface.co/bert-base-multilingual-uncased)| [CoNLL-2003](https://www.clips.uantwerpen.be/conll2003/ner/) | 90.4 |
+| `EN_ELECTRA_EN` | English | [English ELECTRA](https://huggingface.co/google/electra-small-discriminator) | [CoNLL-2003](https://www.clips.uantwerpen.be/conll2003/ner/) | 89.1 |
+
+**F1-score** is the micro-averaged F1-score across entity tags and is
+evaluated on the respective test sets (that have not been used for training nor
+validation of the models).
+
+Note, that we have not spent a lot of time on actually fine-tuning the models,
+so there could be room for improvement. If you are able to improve the models,
+we will be happy to hear from you and include your `NERDA` model.
+
+### Model Performance
+
+The table below summarizes the performance (F1-scores) of the precooked `NERDA` models.
+
+| **Level** | `DA_BERT_ML` | `DA_ELECTRA_DA` | `EN_BERT_ML` | `EN_ELECTRA_EN` |
+|---------------|--------------|-----------------|--------------|-----------------|
+| B-PER | 93.8 | 92.0 | 96.0 | 95.1 |
+| I-PER | 97.8 | 97.1 | 98.5 | 97.9 |
+| B-ORG | 69.5 | 66.9 | 88.4 | 86.2 |
+| I-ORG | 69.9 | 70.7 | 85.7 | 83.1 |
+| B-LOC | 82.5 | 79.0 | 92.3 | 91.1 |
+| I-LOC | 31.6 | 44.4 | 83.9 | 80.5 |
+| B-MISC | 73.4 | 68.6 | 81.8 | 80.1 |
+| I-MISC | 86.1 | 63.6 | 63.4 | 68.4 |
+| **AVG_MICRO** | 82.8 | 79.8 | 90.4 | 89.1 |
+| **AVG_MACRO** | 75.6 | 72.8 | 86.3 | 85.3 |
+
+## 'NERDA'?
+'`NERDA`' originally stands for *'Named Entity Recognition for DAnish'*. However, this
+is somewhat misleading, since the functionality is no longer limited to Danish.
+On the contrary it generalizes to all other languages, i.e. `NERDA` supports
+fine-tuning of transformers for NER tasks for any arbitrary
+language.
+
+## Background
+`NERDA` is developed as a part of [Ekstra Bladet](https://ekstrabladet.dk/)’s activities on Platform Intelligence in News (PIN). PIN is an industrial research project that is carried out in collaboration between the [Technical University of Denmark](https://www.dtu.dk/), [University of Copenhagen](https://www.ku.dk/) and [Copenhagen Business School](https://www.cbs.dk/) with funding from [Innovation Fund Denmark](https://innovationsfonden.dk/). The project runs from 2020-2023 and develops recommender systems and natural language processing systems geared for news publishing, some of which are open sourced like `NERDA`.
+
+## Shout-outs
+- Thanks to [Alexandra Institute](https://alexandra.dk/) for with the [`danlp`](https://github.com/alexandrainst/danlp) package to have encouraged us to develop this package.
+- Thanks to [Malte Højmark-Bertelsen](https://www.linkedin.com/in/malte-h%C3%B8jmark-bertelsen-9a618017b/) and [Kasper Junge](https://www.linkedin.com/in/kasper-juunge/?originalSubdomain=dk) for giving feedback on `NERDA`.
+
+## Read more
+The detailed documentation for `NERDA` including code references and
+extended workflow examples can be accessed [here](https://ebanalyse.github.io/NERDA/).
+
+## Cite this work
+
+```
+@inproceedings{nerda,
+ title = {NERDA},
+ author = {Kjeldgaard, Lars and Nielsen, Lukas},
+ year = {2020},
+ publisher = {{GitHub}},
+ url = {https://github.com/ebanalyse/NERDA}
+}
+```
+
+## Contact
+We hope, that you will find `NERDA` useful.
+
+Please direct any questions and feedbacks to
+[us](mailto:lars.kjeldgaard@eb.dk)!
+
+If you want to contribute (which we encourage you to), open a
+[PR](https://github.com/ebanalyse/NERDA/pulls).
+
+If you encounter a bug or want to suggest an enhancement, please
+[open an issue](https://github.com/ebanalyse/NERDA/issues).
+
+
+
+
+
+%package help
+Summary: Development documents and examples for NERDA
+Provides: python3-NERDA-doc
+%description help
+# NERDA <img src="https://raw.githubusercontent.com/ebanalyse/NERDA/main/logo.png" align="right" height=250/>
+
+![Build status](https://github.com/ebanalyse/NERDA/workflows/build/badge.svg)
+[![codecov](https://codecov.io/gh/ebanalyse/NERDA/branch/main/graph/badge.svg?token=OB6LGFQZYX)](https://codecov.io/gh/ebanalyse/NERDA)
+![PyPI](https://img.shields.io/pypi/v/NERDA.svg)
+![PyPI - Downloads](https://img.shields.io/pypi/dm/NERDA?color=green)
+![License](https://img.shields.io/badge/license-MIT-blue.svg)
+
+Not only is `NERDA` a mesmerizing muppet-like character. `NERDA` is also
+a python package, that offers a slick easy-to-use interface for fine-tuning
+pretrained transformers for Named Entity Recognition
+ (=NER) tasks.
+
+You can also utilize `NERDA` to access a selection of *precooked* `NERDA` models,
+ that you can use right off the shelf for NER tasks.
+
+`NERDA` is built on `huggingface` `transformers` and the popular `pytorch`
+ framework.
+
+## Installation guide
+`NERDA` can be installed from [PyPI](https://pypi.org/project/NERDA/) with
+
+```
+pip install NERDA
+```
+
+If you want the development version then install directly from [GitHub](https://github.com/ebanalyse/NERDA).
+
+## Named-Entity Recogntion tasks
+Named-entity recognition (NER) (also known as (named) entity identification,
+entity chunking, and entity extraction) is a subtask of information extraction
+that seeks to locate and classify named entities mentioned in unstructured
+text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.<sup>[1]</sup>
+
+[1]: https://en.wikipedia.org/wiki/Named-entity_recognition
+
+### Example Task:
+
+**Task**
+
+Identify person names and organizations in text:
+
+*Jim bought 300 shares of Acme Corp.*
+
+**Solution**
+
+| **Named Entity** | **Type** |
+|--------------------|-----------------------|
+| 'Jim' | Person |
+| 'Acme Corp.' | Organization |
+
+Read more about NER on [Wikipedia](https://en.wikipedia.org/wiki/Named-entity_recognition).
+
+## Train Your Own `NERDA` Model
+
+Say, we want to fine-tune a pretrained [Multilingual BERT](https://huggingface.co/bert-base-multilingual-uncased) transformer for NER in English.
+
+Load package.
+
+```python
+from NERDA.models import NERDA
+```
+
+Instantiate a `NERDA` model (*with default settings*) for the
+[`CoNLL-2003`](https://www.clips.uantwerpen.be/conll2003/ner/)
+English NER data set.
+
+```python
+from NERDA.datasets import get_conll_data
+model = NERDA(dataset_training = get_conll_data('train'),
+ dataset_validation = get_conll_data('valid'),
+ transformer = 'bert-base-multilingual-uncased')
+```
+
+By default the network architecture is analogous to that of the models in [Hvingelby et al. 2020](http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.565.pdf).
+
+The model can then be trained/fine-tuned by invoking the `train` method, e.g.
+
+```python
+model.train()
+```
+
+**Note**: this will take some time depending on the dimensions of your machine
+(if you want to skip training, you can go ahead and use one of the models,
+that we have already precooked for you in stead).
+
+After the model has been trained, the model can be used for predicting
+named entities in new texts.
+
+```python
+# text to identify named entities in.
+text = 'Old MacDonald had a farm'
+model.predict_text(text)
+([['Old', 'MacDonald', 'had', 'a', 'farm']], [['B-PER', 'I-PER', 'O', 'O', 'O']])
+```
+This means, that the model identified 'Old MacDonald' as a *PER*son.
+
+Please note, that the `NERDA` model configuration above was instantiated
+with all default settings. You can however customize your `NERDA` model
+in a lot of ways:
+
+- Use your own data set (finetune a transformer for any given language)
+- Choose whatever transformer you like
+- Set all of the hyperparameters for the model
+- You can even apply your own Network Architecture
+
+Read more about advanced usage of `NERDA` in the [detailed documentation](https://ebanalyse.github.io/NERDA/workflow).
+
+## Use a Precooked `NERDA` model
+
+We have precooked a number of `NERDA` models for Danish and English, that you can download
+and use right off the shelf.
+
+Here is an example.
+
+Instantiate a multilingual BERT model, that has been finetuned for NER in Danish,
+`DA_BERT_ML`.
+
+```python
+from NERDA.precooked import DA_BERT_ML
+model = DA_BERT_ML()
+```
+
+Down(load) network from web:
+
+```python
+model.download_network()
+model.load_network()
+```
+
+You can now predict named entities in new (Danish) texts
+
+```python
+# (Danish) text to identify named entities in:
+# 'Jens Hansen har en bondegård' = 'Old MacDonald had a farm'
+text = 'Jens Hansen har en bondegård'
+model.predict_text(text)
+([['Jens', 'Hansen', 'har', 'en', 'bondegård']], [['B-PER', 'I-PER', 'O', 'O', 'O']])
+```
+
+### List of Precooked Models
+
+The table below shows the precooked `NERDA` models publicly available for download.
+
+| **Model** | **Language** | **Transformer** | **Dataset** | **F1-score** |
+|-----------------|--------------|-------------------|---------|-----|
+| `DA_BERT_ML` | Danish | [Multilingual BERT](https://huggingface.co/bert-base-multilingual-uncased) | [DaNE](https://github.com/alexandrainst/danlp/blob/master/docs/docs/datasets.md#dane) | 82.8 |
+`DA_ELECTRA_DA` | Danish | [Danish ELECTRA](https://huggingface.co/Maltehb/-l-ctra-danish-electra-small-uncased) | [DaNE](https://github.com/alexandrainst/danlp/blob/master/docs/docs/datasets.md#dane) | 79.8 |
+| `EN_BERT_ML` | English | [Multilingual BERT](https://huggingface.co/bert-base-multilingual-uncased)| [CoNLL-2003](https://www.clips.uantwerpen.be/conll2003/ner/) | 90.4 |
+| `EN_ELECTRA_EN` | English | [English ELECTRA](https://huggingface.co/google/electra-small-discriminator) | [CoNLL-2003](https://www.clips.uantwerpen.be/conll2003/ner/) | 89.1 |
+
+**F1-score** is the micro-averaged F1-score across entity tags and is
+evaluated on the respective test sets (that have not been used for training nor
+validation of the models).
+
+Note, that we have not spent a lot of time on actually fine-tuning the models,
+so there could be room for improvement. If you are able to improve the models,
+we will be happy to hear from you and include your `NERDA` model.
+
+### Model Performance
+
+The table below summarizes the performance (F1-scores) of the precooked `NERDA` models.
+
+| **Level** | `DA_BERT_ML` | `DA_ELECTRA_DA` | `EN_BERT_ML` | `EN_ELECTRA_EN` |
+|---------------|--------------|-----------------|--------------|-----------------|
+| B-PER | 93.8 | 92.0 | 96.0 | 95.1 |
+| I-PER | 97.8 | 97.1 | 98.5 | 97.9 |
+| B-ORG | 69.5 | 66.9 | 88.4 | 86.2 |
+| I-ORG | 69.9 | 70.7 | 85.7 | 83.1 |
+| B-LOC | 82.5 | 79.0 | 92.3 | 91.1 |
+| I-LOC | 31.6 | 44.4 | 83.9 | 80.5 |
+| B-MISC | 73.4 | 68.6 | 81.8 | 80.1 |
+| I-MISC | 86.1 | 63.6 | 63.4 | 68.4 |
+| **AVG_MICRO** | 82.8 | 79.8 | 90.4 | 89.1 |
+| **AVG_MACRO** | 75.6 | 72.8 | 86.3 | 85.3 |
+
+## 'NERDA'?
+'`NERDA`' originally stands for *'Named Entity Recognition for DAnish'*. However, this
+is somewhat misleading, since the functionality is no longer limited to Danish.
+On the contrary it generalizes to all other languages, i.e. `NERDA` supports
+fine-tuning of transformers for NER tasks for any arbitrary
+language.
+
+## Background
+`NERDA` is developed as a part of [Ekstra Bladet](https://ekstrabladet.dk/)’s activities on Platform Intelligence in News (PIN). PIN is an industrial research project that is carried out in collaboration between the [Technical University of Denmark](https://www.dtu.dk/), [University of Copenhagen](https://www.ku.dk/) and [Copenhagen Business School](https://www.cbs.dk/) with funding from [Innovation Fund Denmark](https://innovationsfonden.dk/). The project runs from 2020-2023 and develops recommender systems and natural language processing systems geared for news publishing, some of which are open sourced like `NERDA`.
+
+## Shout-outs
+- Thanks to [Alexandra Institute](https://alexandra.dk/) for with the [`danlp`](https://github.com/alexandrainst/danlp) package to have encouraged us to develop this package.
+- Thanks to [Malte Højmark-Bertelsen](https://www.linkedin.com/in/malte-h%C3%B8jmark-bertelsen-9a618017b/) and [Kasper Junge](https://www.linkedin.com/in/kasper-juunge/?originalSubdomain=dk) for giving feedback on `NERDA`.
+
+## Read more
+The detailed documentation for `NERDA` including code references and
+extended workflow examples can be accessed [here](https://ebanalyse.github.io/NERDA/).
+
+## Cite this work
+
+```
+@inproceedings{nerda,
+ title = {NERDA},
+ author = {Kjeldgaard, Lars and Nielsen, Lukas},
+ year = {2020},
+ publisher = {{GitHub}},
+ url = {https://github.com/ebanalyse/NERDA}
+}
+```
+
+## Contact
+We hope, that you will find `NERDA` useful.
+
+Please direct any questions and feedbacks to
+[us](mailto:lars.kjeldgaard@eb.dk)!
+
+If you want to contribute (which we encourage you to), open a
+[PR](https://github.com/ebanalyse/NERDA/pulls).
+
+If you encounter a bug or want to suggest an enhancement, please
+[open an issue](https://github.com/ebanalyse/NERDA/issues).
+
+
+
+
+
+%prep
+%autosetup -n NERDA-1.0.0
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-NERDA -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 1.0.0-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..531e752
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+f875ca6cba0c3fd179db410ace164650 NERDA-1.0.0.tar.gz