diff options
| author | CoprDistGit <infra@openeuler.org> | 2023-05-05 06:14:57 +0000 |
|---|---|---|
| committer | CoprDistGit <infra@openeuler.org> | 2023-05-05 06:14:57 +0000 |
| commit | 508a62a8e4af6c9d4ab2b5da6707fcf306dc9e5f (patch) | |
| tree | 9677b2d38f96dfdada8418386de4f26bc65074d5 | |
| parent | 8399352891331461018cd1147d6be22aa8233797 (diff) | |
automatic import of python-nerdaopeneuler20.03
| -rw-r--r-- | .gitignore | 1 | ||||
| -rw-r--r-- | python-nerda.spec | 739 | ||||
| -rw-r--r-- | sources | 1 |
3 files changed, 741 insertions, 0 deletions
@@ -0,0 +1 @@ +/NERDA-1.0.0.tar.gz diff --git a/python-nerda.spec b/python-nerda.spec new file mode 100644 index 0000000..1f5a084 --- /dev/null +++ b/python-nerda.spec @@ -0,0 +1,739 @@ +%global _empty_manifest_terminate_build 0 +Name: python-NERDA +Version: 1.0.0 +Release: 1 +Summary: A Framework for Finetuning Transformers for Named-Entity Recognition +License: MIT License +URL: https://github.com/ebanalyse/NERDA +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/4e/80/3f7ae5e94a16f0dace64996b7eab7ee437b303872654d6705a13654bd132/NERDA-1.0.0.tar.gz +BuildArch: noarch + +Requires: python3-torch +Requires: python3-transformers +Requires: python3-sklearn +Requires: python3-nltk +Requires: python3-pandas +Requires: python3-progressbar +Requires: python3-pyconll + +%description +# NERDA <img src="https://raw.githubusercontent.com/ebanalyse/NERDA/main/logo.png" align="right" height=250/> + + +[](https://codecov.io/gh/ebanalyse/NERDA) + + + + +Not only is `NERDA` a mesmerizing muppet-like character. `NERDA` is also +a python package, that offers a slick easy-to-use interface for fine-tuning +pretrained transformers for Named Entity Recognition + (=NER) tasks. + +You can also utilize `NERDA` to access a selection of *precooked* `NERDA` models, + that you can use right off the shelf for NER tasks. + +`NERDA` is built on `huggingface` `transformers` and the popular `pytorch` + framework. + +## Installation guide +`NERDA` can be installed from [PyPI](https://pypi.org/project/NERDA/) with + +``` +pip install NERDA +``` + +If you want the development version then install directly from [GitHub](https://github.com/ebanalyse/NERDA). + +## Named-Entity Recogntion tasks +Named-entity recognition (NER) (also known as (named) entity identification, +entity chunking, and entity extraction) is a subtask of information extraction +that seeks to locate and classify named entities mentioned in unstructured +text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.<sup>[1]</sup> + +[1]: https://en.wikipedia.org/wiki/Named-entity_recognition + +### Example Task: + +**Task** + +Identify person names and organizations in text: + +*Jim bought 300 shares of Acme Corp.* + +**Solution** + +| **Named Entity** | **Type** | +|--------------------|-----------------------| +| 'Jim' | Person | +| 'Acme Corp.' | Organization | + +Read more about NER on [Wikipedia](https://en.wikipedia.org/wiki/Named-entity_recognition). + +## Train Your Own `NERDA` Model + +Say, we want to fine-tune a pretrained [Multilingual BERT](https://huggingface.co/bert-base-multilingual-uncased) transformer for NER in English. + +Load package. + +```python +from NERDA.models import NERDA +``` + +Instantiate a `NERDA` model (*with default settings*) for the +[`CoNLL-2003`](https://www.clips.uantwerpen.be/conll2003/ner/) +English NER data set. + +```python +from NERDA.datasets import get_conll_data +model = NERDA(dataset_training = get_conll_data('train'), + dataset_validation = get_conll_data('valid'), + transformer = 'bert-base-multilingual-uncased') +``` + +By default the network architecture is analogous to that of the models in [Hvingelby et al. 2020](http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.565.pdf). + +The model can then be trained/fine-tuned by invoking the `train` method, e.g. + +```python +model.train() +``` + +**Note**: this will take some time depending on the dimensions of your machine +(if you want to skip training, you can go ahead and use one of the models, +that we have already precooked for you in stead). + +After the model has been trained, the model can be used for predicting +named entities in new texts. + +```python +# text to identify named entities in. +text = 'Old MacDonald had a farm' +model.predict_text(text) +([['Old', 'MacDonald', 'had', 'a', 'farm']], [['B-PER', 'I-PER', 'O', 'O', 'O']]) +``` +This means, that the model identified 'Old MacDonald' as a *PER*son. + +Please note, that the `NERDA` model configuration above was instantiated +with all default settings. You can however customize your `NERDA` model +in a lot of ways: + +- Use your own data set (finetune a transformer for any given language) +- Choose whatever transformer you like +- Set all of the hyperparameters for the model +- You can even apply your own Network Architecture + +Read more about advanced usage of `NERDA` in the [detailed documentation](https://ebanalyse.github.io/NERDA/workflow). + +## Use a Precooked `NERDA` model + +We have precooked a number of `NERDA` models for Danish and English, that you can download +and use right off the shelf. + +Here is an example. + +Instantiate a multilingual BERT model, that has been finetuned for NER in Danish, +`DA_BERT_ML`. + +```python +from NERDA.precooked import DA_BERT_ML +model = DA_BERT_ML() +``` + +Down(load) network from web: + +```python +model.download_network() +model.load_network() +``` + +You can now predict named entities in new (Danish) texts + +```python +# (Danish) text to identify named entities in: +# 'Jens Hansen har en bondegård' = 'Old MacDonald had a farm' +text = 'Jens Hansen har en bondegård' +model.predict_text(text) +([['Jens', 'Hansen', 'har', 'en', 'bondegård']], [['B-PER', 'I-PER', 'O', 'O', 'O']]) +``` + +### List of Precooked Models + +The table below shows the precooked `NERDA` models publicly available for download. + +| **Model** | **Language** | **Transformer** | **Dataset** | **F1-score** | +|-----------------|--------------|-------------------|---------|-----| +| `DA_BERT_ML` | Danish | [Multilingual BERT](https://huggingface.co/bert-base-multilingual-uncased) | [DaNE](https://github.com/alexandrainst/danlp/blob/master/docs/docs/datasets.md#dane) | 82.8 | +`DA_ELECTRA_DA` | Danish | [Danish ELECTRA](https://huggingface.co/Maltehb/-l-ctra-danish-electra-small-uncased) | [DaNE](https://github.com/alexandrainst/danlp/blob/master/docs/docs/datasets.md#dane) | 79.8 | +| `EN_BERT_ML` | English | [Multilingual BERT](https://huggingface.co/bert-base-multilingual-uncased)| [CoNLL-2003](https://www.clips.uantwerpen.be/conll2003/ner/) | 90.4 | +| `EN_ELECTRA_EN` | English | [English ELECTRA](https://huggingface.co/google/electra-small-discriminator) | [CoNLL-2003](https://www.clips.uantwerpen.be/conll2003/ner/) | 89.1 | + +**F1-score** is the micro-averaged F1-score across entity tags and is +evaluated on the respective test sets (that have not been used for training nor +validation of the models). + +Note, that we have not spent a lot of time on actually fine-tuning the models, +so there could be room for improvement. If you are able to improve the models, +we will be happy to hear from you and include your `NERDA` model. + +### Model Performance + +The table below summarizes the performance (F1-scores) of the precooked `NERDA` models. + +| **Level** | `DA_BERT_ML` | `DA_ELECTRA_DA` | `EN_BERT_ML` | `EN_ELECTRA_EN` | +|---------------|--------------|-----------------|--------------|-----------------| +| B-PER | 93.8 | 92.0 | 96.0 | 95.1 | +| I-PER | 97.8 | 97.1 | 98.5 | 97.9 | +| B-ORG | 69.5 | 66.9 | 88.4 | 86.2 | +| I-ORG | 69.9 | 70.7 | 85.7 | 83.1 | +| B-LOC | 82.5 | 79.0 | 92.3 | 91.1 | +| I-LOC | 31.6 | 44.4 | 83.9 | 80.5 | +| B-MISC | 73.4 | 68.6 | 81.8 | 80.1 | +| I-MISC | 86.1 | 63.6 | 63.4 | 68.4 | +| **AVG_MICRO** | 82.8 | 79.8 | 90.4 | 89.1 | +| **AVG_MACRO** | 75.6 | 72.8 | 86.3 | 85.3 | + +## 'NERDA'? +'`NERDA`' originally stands for *'Named Entity Recognition for DAnish'*. However, this +is somewhat misleading, since the functionality is no longer limited to Danish. +On the contrary it generalizes to all other languages, i.e. `NERDA` supports +fine-tuning of transformers for NER tasks for any arbitrary +language. + +## Background +`NERDA` is developed as a part of [Ekstra Bladet](https://ekstrabladet.dk/)’s activities on Platform Intelligence in News (PIN). PIN is an industrial research project that is carried out in collaboration between the [Technical University of Denmark](https://www.dtu.dk/), [University of Copenhagen](https://www.ku.dk/) and [Copenhagen Business School](https://www.cbs.dk/) with funding from [Innovation Fund Denmark](https://innovationsfonden.dk/). The project runs from 2020-2023 and develops recommender systems and natural language processing systems geared for news publishing, some of which are open sourced like `NERDA`. + +## Shout-outs +- Thanks to [Alexandra Institute](https://alexandra.dk/) for with the [`danlp`](https://github.com/alexandrainst/danlp) package to have encouraged us to develop this package. +- Thanks to [Malte Højmark-Bertelsen](https://www.linkedin.com/in/malte-h%C3%B8jmark-bertelsen-9a618017b/) and [Kasper Junge](https://www.linkedin.com/in/kasper-juunge/?originalSubdomain=dk) for giving feedback on `NERDA`. + +## Read more +The detailed documentation for `NERDA` including code references and +extended workflow examples can be accessed [here](https://ebanalyse.github.io/NERDA/). + +## Cite this work + +``` +@inproceedings{nerda, + title = {NERDA}, + author = {Kjeldgaard, Lars and Nielsen, Lukas}, + year = {2020}, + publisher = {{GitHub}}, + url = {https://github.com/ebanalyse/NERDA} +} +``` + +## Contact +We hope, that you will find `NERDA` useful. + +Please direct any questions and feedbacks to +[us](mailto:lars.kjeldgaard@eb.dk)! + +If you want to contribute (which we encourage you to), open a +[PR](https://github.com/ebanalyse/NERDA/pulls). + +If you encounter a bug or want to suggest an enhancement, please +[open an issue](https://github.com/ebanalyse/NERDA/issues). + + + + + +%package -n python3-NERDA +Summary: A Framework for Finetuning Transformers for Named-Entity Recognition +Provides: python-NERDA +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-NERDA +# NERDA <img src="https://raw.githubusercontent.com/ebanalyse/NERDA/main/logo.png" align="right" height=250/> + + +[](https://codecov.io/gh/ebanalyse/NERDA) + + + + +Not only is `NERDA` a mesmerizing muppet-like character. `NERDA` is also +a python package, that offers a slick easy-to-use interface for fine-tuning +pretrained transformers for Named Entity Recognition + (=NER) tasks. + +You can also utilize `NERDA` to access a selection of *precooked* `NERDA` models, + that you can use right off the shelf for NER tasks. + +`NERDA` is built on `huggingface` `transformers` and the popular `pytorch` + framework. + +## Installation guide +`NERDA` can be installed from [PyPI](https://pypi.org/project/NERDA/) with + +``` +pip install NERDA +``` + +If you want the development version then install directly from [GitHub](https://github.com/ebanalyse/NERDA). + +## Named-Entity Recogntion tasks +Named-entity recognition (NER) (also known as (named) entity identification, +entity chunking, and entity extraction) is a subtask of information extraction +that seeks to locate and classify named entities mentioned in unstructured +text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.<sup>[1]</sup> + +[1]: https://en.wikipedia.org/wiki/Named-entity_recognition + +### Example Task: + +**Task** + +Identify person names and organizations in text: + +*Jim bought 300 shares of Acme Corp.* + +**Solution** + +| **Named Entity** | **Type** | +|--------------------|-----------------------| +| 'Jim' | Person | +| 'Acme Corp.' | Organization | + +Read more about NER on [Wikipedia](https://en.wikipedia.org/wiki/Named-entity_recognition). + +## Train Your Own `NERDA` Model + +Say, we want to fine-tune a pretrained [Multilingual BERT](https://huggingface.co/bert-base-multilingual-uncased) transformer for NER in English. + +Load package. + +```python +from NERDA.models import NERDA +``` + +Instantiate a `NERDA` model (*with default settings*) for the +[`CoNLL-2003`](https://www.clips.uantwerpen.be/conll2003/ner/) +English NER data set. + +```python +from NERDA.datasets import get_conll_data +model = NERDA(dataset_training = get_conll_data('train'), + dataset_validation = get_conll_data('valid'), + transformer = 'bert-base-multilingual-uncased') +``` + +By default the network architecture is analogous to that of the models in [Hvingelby et al. 2020](http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.565.pdf). + +The model can then be trained/fine-tuned by invoking the `train` method, e.g. + +```python +model.train() +``` + +**Note**: this will take some time depending on the dimensions of your machine +(if you want to skip training, you can go ahead and use one of the models, +that we have already precooked for you in stead). + +After the model has been trained, the model can be used for predicting +named entities in new texts. + +```python +# text to identify named entities in. +text = 'Old MacDonald had a farm' +model.predict_text(text) +([['Old', 'MacDonald', 'had', 'a', 'farm']], [['B-PER', 'I-PER', 'O', 'O', 'O']]) +``` +This means, that the model identified 'Old MacDonald' as a *PER*son. + +Please note, that the `NERDA` model configuration above was instantiated +with all default settings. You can however customize your `NERDA` model +in a lot of ways: + +- Use your own data set (finetune a transformer for any given language) +- Choose whatever transformer you like +- Set all of the hyperparameters for the model +- You can even apply your own Network Architecture + +Read more about advanced usage of `NERDA` in the [detailed documentation](https://ebanalyse.github.io/NERDA/workflow). + +## Use a Precooked `NERDA` model + +We have precooked a number of `NERDA` models for Danish and English, that you can download +and use right off the shelf. + +Here is an example. + +Instantiate a multilingual BERT model, that has been finetuned for NER in Danish, +`DA_BERT_ML`. + +```python +from NERDA.precooked import DA_BERT_ML +model = DA_BERT_ML() +``` + +Down(load) network from web: + +```python +model.download_network() +model.load_network() +``` + +You can now predict named entities in new (Danish) texts + +```python +# (Danish) text to identify named entities in: +# 'Jens Hansen har en bondegård' = 'Old MacDonald had a farm' +text = 'Jens Hansen har en bondegård' +model.predict_text(text) +([['Jens', 'Hansen', 'har', 'en', 'bondegård']], [['B-PER', 'I-PER', 'O', 'O', 'O']]) +``` + +### List of Precooked Models + +The table below shows the precooked `NERDA` models publicly available for download. + +| **Model** | **Language** | **Transformer** | **Dataset** | **F1-score** | +|-----------------|--------------|-------------------|---------|-----| +| `DA_BERT_ML` | Danish | [Multilingual BERT](https://huggingface.co/bert-base-multilingual-uncased) | [DaNE](https://github.com/alexandrainst/danlp/blob/master/docs/docs/datasets.md#dane) | 82.8 | +`DA_ELECTRA_DA` | Danish | [Danish ELECTRA](https://huggingface.co/Maltehb/-l-ctra-danish-electra-small-uncased) | [DaNE](https://github.com/alexandrainst/danlp/blob/master/docs/docs/datasets.md#dane) | 79.8 | +| `EN_BERT_ML` | English | [Multilingual BERT](https://huggingface.co/bert-base-multilingual-uncased)| [CoNLL-2003](https://www.clips.uantwerpen.be/conll2003/ner/) | 90.4 | +| `EN_ELECTRA_EN` | English | [English ELECTRA](https://huggingface.co/google/electra-small-discriminator) | [CoNLL-2003](https://www.clips.uantwerpen.be/conll2003/ner/) | 89.1 | + +**F1-score** is the micro-averaged F1-score across entity tags and is +evaluated on the respective test sets (that have not been used for training nor +validation of the models). + +Note, that we have not spent a lot of time on actually fine-tuning the models, +so there could be room for improvement. If you are able to improve the models, +we will be happy to hear from you and include your `NERDA` model. + +### Model Performance + +The table below summarizes the performance (F1-scores) of the precooked `NERDA` models. + +| **Level** | `DA_BERT_ML` | `DA_ELECTRA_DA` | `EN_BERT_ML` | `EN_ELECTRA_EN` | +|---------------|--------------|-----------------|--------------|-----------------| +| B-PER | 93.8 | 92.0 | 96.0 | 95.1 | +| I-PER | 97.8 | 97.1 | 98.5 | 97.9 | +| B-ORG | 69.5 | 66.9 | 88.4 | 86.2 | +| I-ORG | 69.9 | 70.7 | 85.7 | 83.1 | +| B-LOC | 82.5 | 79.0 | 92.3 | 91.1 | +| I-LOC | 31.6 | 44.4 | 83.9 | 80.5 | +| B-MISC | 73.4 | 68.6 | 81.8 | 80.1 | +| I-MISC | 86.1 | 63.6 | 63.4 | 68.4 | +| **AVG_MICRO** | 82.8 | 79.8 | 90.4 | 89.1 | +| **AVG_MACRO** | 75.6 | 72.8 | 86.3 | 85.3 | + +## 'NERDA'? +'`NERDA`' originally stands for *'Named Entity Recognition for DAnish'*. However, this +is somewhat misleading, since the functionality is no longer limited to Danish. +On the contrary it generalizes to all other languages, i.e. `NERDA` supports +fine-tuning of transformers for NER tasks for any arbitrary +language. + +## Background +`NERDA` is developed as a part of [Ekstra Bladet](https://ekstrabladet.dk/)’s activities on Platform Intelligence in News (PIN). PIN is an industrial research project that is carried out in collaboration between the [Technical University of Denmark](https://www.dtu.dk/), [University of Copenhagen](https://www.ku.dk/) and [Copenhagen Business School](https://www.cbs.dk/) with funding from [Innovation Fund Denmark](https://innovationsfonden.dk/). The project runs from 2020-2023 and develops recommender systems and natural language processing systems geared for news publishing, some of which are open sourced like `NERDA`. + +## Shout-outs +- Thanks to [Alexandra Institute](https://alexandra.dk/) for with the [`danlp`](https://github.com/alexandrainst/danlp) package to have encouraged us to develop this package. +- Thanks to [Malte Højmark-Bertelsen](https://www.linkedin.com/in/malte-h%C3%B8jmark-bertelsen-9a618017b/) and [Kasper Junge](https://www.linkedin.com/in/kasper-juunge/?originalSubdomain=dk) for giving feedback on `NERDA`. + +## Read more +The detailed documentation for `NERDA` including code references and +extended workflow examples can be accessed [here](https://ebanalyse.github.io/NERDA/). + +## Cite this work + +``` +@inproceedings{nerda, + title = {NERDA}, + author = {Kjeldgaard, Lars and Nielsen, Lukas}, + year = {2020}, + publisher = {{GitHub}}, + url = {https://github.com/ebanalyse/NERDA} +} +``` + +## Contact +We hope, that you will find `NERDA` useful. + +Please direct any questions and feedbacks to +[us](mailto:lars.kjeldgaard@eb.dk)! + +If you want to contribute (which we encourage you to), open a +[PR](https://github.com/ebanalyse/NERDA/pulls). + +If you encounter a bug or want to suggest an enhancement, please +[open an issue](https://github.com/ebanalyse/NERDA/issues). + + + + + +%package help +Summary: Development documents and examples for NERDA +Provides: python3-NERDA-doc +%description help +# NERDA <img src="https://raw.githubusercontent.com/ebanalyse/NERDA/main/logo.png" align="right" height=250/> + + +[](https://codecov.io/gh/ebanalyse/NERDA) + + + + +Not only is `NERDA` a mesmerizing muppet-like character. `NERDA` is also +a python package, that offers a slick easy-to-use interface for fine-tuning +pretrained transformers for Named Entity Recognition + (=NER) tasks. + +You can also utilize `NERDA` to access a selection of *precooked* `NERDA` models, + that you can use right off the shelf for NER tasks. + +`NERDA` is built on `huggingface` `transformers` and the popular `pytorch` + framework. + +## Installation guide +`NERDA` can be installed from [PyPI](https://pypi.org/project/NERDA/) with + +``` +pip install NERDA +``` + +If you want the development version then install directly from [GitHub](https://github.com/ebanalyse/NERDA). + +## Named-Entity Recogntion tasks +Named-entity recognition (NER) (also known as (named) entity identification, +entity chunking, and entity extraction) is a subtask of information extraction +that seeks to locate and classify named entities mentioned in unstructured +text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.<sup>[1]</sup> + +[1]: https://en.wikipedia.org/wiki/Named-entity_recognition + +### Example Task: + +**Task** + +Identify person names and organizations in text: + +*Jim bought 300 shares of Acme Corp.* + +**Solution** + +| **Named Entity** | **Type** | +|--------------------|-----------------------| +| 'Jim' | Person | +| 'Acme Corp.' | Organization | + +Read more about NER on [Wikipedia](https://en.wikipedia.org/wiki/Named-entity_recognition). + +## Train Your Own `NERDA` Model + +Say, we want to fine-tune a pretrained [Multilingual BERT](https://huggingface.co/bert-base-multilingual-uncased) transformer for NER in English. + +Load package. + +```python +from NERDA.models import NERDA +``` + +Instantiate a `NERDA` model (*with default settings*) for the +[`CoNLL-2003`](https://www.clips.uantwerpen.be/conll2003/ner/) +English NER data set. + +```python +from NERDA.datasets import get_conll_data +model = NERDA(dataset_training = get_conll_data('train'), + dataset_validation = get_conll_data('valid'), + transformer = 'bert-base-multilingual-uncased') +``` + +By default the network architecture is analogous to that of the models in [Hvingelby et al. 2020](http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.565.pdf). + +The model can then be trained/fine-tuned by invoking the `train` method, e.g. + +```python +model.train() +``` + +**Note**: this will take some time depending on the dimensions of your machine +(if you want to skip training, you can go ahead and use one of the models, +that we have already precooked for you in stead). + +After the model has been trained, the model can be used for predicting +named entities in new texts. + +```python +# text to identify named entities in. +text = 'Old MacDonald had a farm' +model.predict_text(text) +([['Old', 'MacDonald', 'had', 'a', 'farm']], [['B-PER', 'I-PER', 'O', 'O', 'O']]) +``` +This means, that the model identified 'Old MacDonald' as a *PER*son. + +Please note, that the `NERDA` model configuration above was instantiated +with all default settings. You can however customize your `NERDA` model +in a lot of ways: + +- Use your own data set (finetune a transformer for any given language) +- Choose whatever transformer you like +- Set all of the hyperparameters for the model +- You can even apply your own Network Architecture + +Read more about advanced usage of `NERDA` in the [detailed documentation](https://ebanalyse.github.io/NERDA/workflow). + +## Use a Precooked `NERDA` model + +We have precooked a number of `NERDA` models for Danish and English, that you can download +and use right off the shelf. + +Here is an example. + +Instantiate a multilingual BERT model, that has been finetuned for NER in Danish, +`DA_BERT_ML`. + +```python +from NERDA.precooked import DA_BERT_ML +model = DA_BERT_ML() +``` + +Down(load) network from web: + +```python +model.download_network() +model.load_network() +``` + +You can now predict named entities in new (Danish) texts + +```python +# (Danish) text to identify named entities in: +# 'Jens Hansen har en bondegård' = 'Old MacDonald had a farm' +text = 'Jens Hansen har en bondegård' +model.predict_text(text) +([['Jens', 'Hansen', 'har', 'en', 'bondegård']], [['B-PER', 'I-PER', 'O', 'O', 'O']]) +``` + +### List of Precooked Models + +The table below shows the precooked `NERDA` models publicly available for download. + +| **Model** | **Language** | **Transformer** | **Dataset** | **F1-score** | +|-----------------|--------------|-------------------|---------|-----| +| `DA_BERT_ML` | Danish | [Multilingual BERT](https://huggingface.co/bert-base-multilingual-uncased) | [DaNE](https://github.com/alexandrainst/danlp/blob/master/docs/docs/datasets.md#dane) | 82.8 | +`DA_ELECTRA_DA` | Danish | [Danish ELECTRA](https://huggingface.co/Maltehb/-l-ctra-danish-electra-small-uncased) | [DaNE](https://github.com/alexandrainst/danlp/blob/master/docs/docs/datasets.md#dane) | 79.8 | +| `EN_BERT_ML` | English | [Multilingual BERT](https://huggingface.co/bert-base-multilingual-uncased)| [CoNLL-2003](https://www.clips.uantwerpen.be/conll2003/ner/) | 90.4 | +| `EN_ELECTRA_EN` | English | [English ELECTRA](https://huggingface.co/google/electra-small-discriminator) | [CoNLL-2003](https://www.clips.uantwerpen.be/conll2003/ner/) | 89.1 | + +**F1-score** is the micro-averaged F1-score across entity tags and is +evaluated on the respective test sets (that have not been used for training nor +validation of the models). + +Note, that we have not spent a lot of time on actually fine-tuning the models, +so there could be room for improvement. If you are able to improve the models, +we will be happy to hear from you and include your `NERDA` model. + +### Model Performance + +The table below summarizes the performance (F1-scores) of the precooked `NERDA` models. + +| **Level** | `DA_BERT_ML` | `DA_ELECTRA_DA` | `EN_BERT_ML` | `EN_ELECTRA_EN` | +|---------------|--------------|-----------------|--------------|-----------------| +| B-PER | 93.8 | 92.0 | 96.0 | 95.1 | +| I-PER | 97.8 | 97.1 | 98.5 | 97.9 | +| B-ORG | 69.5 | 66.9 | 88.4 | 86.2 | +| I-ORG | 69.9 | 70.7 | 85.7 | 83.1 | +| B-LOC | 82.5 | 79.0 | 92.3 | 91.1 | +| I-LOC | 31.6 | 44.4 | 83.9 | 80.5 | +| B-MISC | 73.4 | 68.6 | 81.8 | 80.1 | +| I-MISC | 86.1 | 63.6 | 63.4 | 68.4 | +| **AVG_MICRO** | 82.8 | 79.8 | 90.4 | 89.1 | +| **AVG_MACRO** | 75.6 | 72.8 | 86.3 | 85.3 | + +## 'NERDA'? +'`NERDA`' originally stands for *'Named Entity Recognition for DAnish'*. However, this +is somewhat misleading, since the functionality is no longer limited to Danish. +On the contrary it generalizes to all other languages, i.e. `NERDA` supports +fine-tuning of transformers for NER tasks for any arbitrary +language. + +## Background +`NERDA` is developed as a part of [Ekstra Bladet](https://ekstrabladet.dk/)’s activities on Platform Intelligence in News (PIN). PIN is an industrial research project that is carried out in collaboration between the [Technical University of Denmark](https://www.dtu.dk/), [University of Copenhagen](https://www.ku.dk/) and [Copenhagen Business School](https://www.cbs.dk/) with funding from [Innovation Fund Denmark](https://innovationsfonden.dk/). The project runs from 2020-2023 and develops recommender systems and natural language processing systems geared for news publishing, some of which are open sourced like `NERDA`. + +## Shout-outs +- Thanks to [Alexandra Institute](https://alexandra.dk/) for with the [`danlp`](https://github.com/alexandrainst/danlp) package to have encouraged us to develop this package. +- Thanks to [Malte Højmark-Bertelsen](https://www.linkedin.com/in/malte-h%C3%B8jmark-bertelsen-9a618017b/) and [Kasper Junge](https://www.linkedin.com/in/kasper-juunge/?originalSubdomain=dk) for giving feedback on `NERDA`. + +## Read more +The detailed documentation for `NERDA` including code references and +extended workflow examples can be accessed [here](https://ebanalyse.github.io/NERDA/). + +## Cite this work + +``` +@inproceedings{nerda, + title = {NERDA}, + author = {Kjeldgaard, Lars and Nielsen, Lukas}, + year = {2020}, + publisher = {{GitHub}}, + url = {https://github.com/ebanalyse/NERDA} +} +``` + +## Contact +We hope, that you will find `NERDA` useful. + +Please direct any questions and feedbacks to +[us](mailto:lars.kjeldgaard@eb.dk)! + +If you want to contribute (which we encourage you to), open a +[PR](https://github.com/ebanalyse/NERDA/pulls). + +If you encounter a bug or want to suggest an enhancement, please +[open an issue](https://github.com/ebanalyse/NERDA/issues). + + + + + +%prep +%autosetup -n NERDA-1.0.0 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-NERDA -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 1.0.0-1 +- Package Spec generated @@ -0,0 +1 @@ +f875ca6cba0c3fd179db410ace164650 NERDA-1.0.0.tar.gz |
