diff options
author | CoprDistGit <infra@openeuler.org> | 2023-05-05 04:43:55 +0000 |
---|---|---|
committer | CoprDistGit <infra@openeuler.org> | 2023-05-05 04:43:55 +0000 |
commit | d3d5da0104bb0199cc78cb042dfdd071ed736161 (patch) | |
tree | 4971aa44806d7b697aa54ff08ecfb854e013fb3f | |
parent | d0a3d0e716d30329d270dc27569d6509fe20745c (diff) |
automatic import of python-bert-embeddingopeneuler20.03
-rw-r--r-- | .gitignore | 1 | ||||
-rw-r--r-- | python-bert-embedding.spec | 365 | ||||
-rw-r--r-- | sources | 1 |
3 files changed, 367 insertions, 0 deletions
@@ -0,0 +1 @@ +/bert_embedding-1.0.1.tar.gz diff --git a/python-bert-embedding.spec b/python-bert-embedding.spec new file mode 100644 index 0000000..36d8ada --- /dev/null +++ b/python-bert-embedding.spec @@ -0,0 +1,365 @@ +%global _empty_manifest_terminate_build 0 +Name: python-bert-embedding +Version: 1.0.1 +Release: 1 +Summary: BERT token level embedding with MxNet +License: ALv2 +URL: https://github.com/imgarylai/bert_embedding +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/32/49/13f76cef121677994bb1b0e8baa8b8bf88405eb1be554925fe8682b7b71e/bert_embedding-1.0.1.tar.gz +BuildArch: noarch + +Requires: python3-typing +Requires: python3-numpy +Requires: python3-mxnet +Requires: python3-gluonnlp +Requires: python3-mxnet-cu92 + +%description +# Bert Embeddings + +[](https://travis-ci.org/imgarylai/bert-embedding) [](https://codecov.io/gh/imgarylai/bert-embedding) [](https://pypi.org/project/bert-embedding/) [](https://bert-embedding.readthedocs.io/en/latest/?badge=latest) + + +[BERT](https://arxiv.org/abs/1810.04805), published by [Google](https://github.com/google-research/bert), is new way to obtain pre-trained language model word representation. Many NLP tasks are benefit from BERT to get the SOTA. + +The goal of this project is to obtain the token embedding from BERT's pre-trained model. In this way, instead of building and do fine-tuning for an end-to-end NLP model, you can build your model by just utilizing or token embedding. + +This project is implemented with [@MXNet](https://github.com/apache/incubator-mxnet). Special thanks to [@gluon-nlp](https://github.com/dmlc/gluon-nlp) team. + +## Install + +``` +pip install bert-embedding +# If you want to run on GPU machine, please install `mxnet-cu92`. +pip install mxnet-cu92 +``` + +## Usage + +```python +from bert_embedding import BertEmbedding + +bert_abstract = """We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. + Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. + As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. +BERT is conceptually simple and empirically powerful. +It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%.""" +sentences = bert_abstract.split('\n') +bert_embedding = BertEmbedding() +result = bert_embedding(sentences) +``` +If you want to use GPU, please import mxnet and set context + +```python +import mxnet as mx +from bert_embedding import BertEmbedding + +... + +ctx = mx.gpu(0) +bert = BertEmbedding(ctx=ctx) +``` + +This result is a list of a tuple containing (tokens, tokens embedding) + +For example: + +```python +first_sentence = result[0] + +first_sentence[0] +# ['we', 'introduce', 'a', 'new', 'language', 'representation', 'model', 'called', 'bert', ',', 'which', 'stands', 'for', 'bidirectional', 'encoder', 'representations', 'from', 'transformers'] +len(first_sentence[0]) +# 18 + + +len(first_sentence[1]) +# 18 +first_token_in_first_sentence = first_sentence[1] +first_token_in_first_sentence[1] +# array([ 0.4805648 , 0.18369392, -0.28554988, ..., -0.01961522, +# 1.0207764 , -0.67167974], dtype=float32) +first_token_in_first_sentence[1].shape +# (768,) +``` + +## OOV + +There are three ways to handle oov, avg (default), sum, and last. This can be specified in encoding. + +```python +... +bert_embedding = BertEmbedding() +bert_embedding(sentences, 'sum') +... +``` + +## Available pre-trained BERT models + +| |book_corpus_wiki_en_uncased|book_corpus_wiki_en_cased|wiki_multilingual|wiki_multilingual_cased|wiki_cn| +|---|---|---|---|---|---| +|bert_12_768_12|✓|✓|✓|✓|✓| +|bert_24_1024_16|x|✓|x|x|x| + +Example of using the large pre-trained BERT model from Google + +```python +from bert_embedding import BertEmbedding + +bert_embedding = BertEmbedding(model='bert_24_1024_16', dataset_name='book_corpus_wiki_en_cased') +``` + +Source: [gluonnlp](http://gluon-nlp.mxnet.io/model_zoo/bert/index.html) + + + +%package -n python3-bert-embedding +Summary: BERT token level embedding with MxNet +Provides: python-bert-embedding +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-bert-embedding +# Bert Embeddings + +[](https://travis-ci.org/imgarylai/bert-embedding) [](https://codecov.io/gh/imgarylai/bert-embedding) [](https://pypi.org/project/bert-embedding/) [](https://bert-embedding.readthedocs.io/en/latest/?badge=latest) + + +[BERT](https://arxiv.org/abs/1810.04805), published by [Google](https://github.com/google-research/bert), is new way to obtain pre-trained language model word representation. Many NLP tasks are benefit from BERT to get the SOTA. + +The goal of this project is to obtain the token embedding from BERT's pre-trained model. In this way, instead of building and do fine-tuning for an end-to-end NLP model, you can build your model by just utilizing or token embedding. + +This project is implemented with [@MXNet](https://github.com/apache/incubator-mxnet). Special thanks to [@gluon-nlp](https://github.com/dmlc/gluon-nlp) team. + +## Install + +``` +pip install bert-embedding +# If you want to run on GPU machine, please install `mxnet-cu92`. +pip install mxnet-cu92 +``` + +## Usage + +```python +from bert_embedding import BertEmbedding + +bert_abstract = """We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. + Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. + As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. +BERT is conceptually simple and empirically powerful. +It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%.""" +sentences = bert_abstract.split('\n') +bert_embedding = BertEmbedding() +result = bert_embedding(sentences) +``` +If you want to use GPU, please import mxnet and set context + +```python +import mxnet as mx +from bert_embedding import BertEmbedding + +... + +ctx = mx.gpu(0) +bert = BertEmbedding(ctx=ctx) +``` + +This result is a list of a tuple containing (tokens, tokens embedding) + +For example: + +```python +first_sentence = result[0] + +first_sentence[0] +# ['we', 'introduce', 'a', 'new', 'language', 'representation', 'model', 'called', 'bert', ',', 'which', 'stands', 'for', 'bidirectional', 'encoder', 'representations', 'from', 'transformers'] +len(first_sentence[0]) +# 18 + + +len(first_sentence[1]) +# 18 +first_token_in_first_sentence = first_sentence[1] +first_token_in_first_sentence[1] +# array([ 0.4805648 , 0.18369392, -0.28554988, ..., -0.01961522, +# 1.0207764 , -0.67167974], dtype=float32) +first_token_in_first_sentence[1].shape +# (768,) +``` + +## OOV + +There are three ways to handle oov, avg (default), sum, and last. This can be specified in encoding. + +```python +... +bert_embedding = BertEmbedding() +bert_embedding(sentences, 'sum') +... +``` + +## Available pre-trained BERT models + +| |book_corpus_wiki_en_uncased|book_corpus_wiki_en_cased|wiki_multilingual|wiki_multilingual_cased|wiki_cn| +|---|---|---|---|---|---| +|bert_12_768_12|✓|✓|✓|✓|✓| +|bert_24_1024_16|x|✓|x|x|x| + +Example of using the large pre-trained BERT model from Google + +```python +from bert_embedding import BertEmbedding + +bert_embedding = BertEmbedding(model='bert_24_1024_16', dataset_name='book_corpus_wiki_en_cased') +``` + +Source: [gluonnlp](http://gluon-nlp.mxnet.io/model_zoo/bert/index.html) + + + +%package help +Summary: Development documents and examples for bert-embedding +Provides: python3-bert-embedding-doc +%description help +# Bert Embeddings + +[](https://travis-ci.org/imgarylai/bert-embedding) [](https://codecov.io/gh/imgarylai/bert-embedding) [](https://pypi.org/project/bert-embedding/) [](https://bert-embedding.readthedocs.io/en/latest/?badge=latest) + + +[BERT](https://arxiv.org/abs/1810.04805), published by [Google](https://github.com/google-research/bert), is new way to obtain pre-trained language model word representation. Many NLP tasks are benefit from BERT to get the SOTA. + +The goal of this project is to obtain the token embedding from BERT's pre-trained model. In this way, instead of building and do fine-tuning for an end-to-end NLP model, you can build your model by just utilizing or token embedding. + +This project is implemented with [@MXNet](https://github.com/apache/incubator-mxnet). Special thanks to [@gluon-nlp](https://github.com/dmlc/gluon-nlp) team. + +## Install + +``` +pip install bert-embedding +# If you want to run on GPU machine, please install `mxnet-cu92`. +pip install mxnet-cu92 +``` + +## Usage + +```python +from bert_embedding import BertEmbedding + +bert_abstract = """We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. + Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. + As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. +BERT is conceptually simple and empirically powerful. +It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%.""" +sentences = bert_abstract.split('\n') +bert_embedding = BertEmbedding() +result = bert_embedding(sentences) +``` +If you want to use GPU, please import mxnet and set context + +```python +import mxnet as mx +from bert_embedding import BertEmbedding + +... + +ctx = mx.gpu(0) +bert = BertEmbedding(ctx=ctx) +``` + +This result is a list of a tuple containing (tokens, tokens embedding) + +For example: + +```python +first_sentence = result[0] + +first_sentence[0] +# ['we', 'introduce', 'a', 'new', 'language', 'representation', 'model', 'called', 'bert', ',', 'which', 'stands', 'for', 'bidirectional', 'encoder', 'representations', 'from', 'transformers'] +len(first_sentence[0]) +# 18 + + +len(first_sentence[1]) +# 18 +first_token_in_first_sentence = first_sentence[1] +first_token_in_first_sentence[1] +# array([ 0.4805648 , 0.18369392, -0.28554988, ..., -0.01961522, +# 1.0207764 , -0.67167974], dtype=float32) +first_token_in_first_sentence[1].shape +# (768,) +``` + +## OOV + +There are three ways to handle oov, avg (default), sum, and last. This can be specified in encoding. + +```python +... +bert_embedding = BertEmbedding() +bert_embedding(sentences, 'sum') +... +``` + +## Available pre-trained BERT models + +| |book_corpus_wiki_en_uncased|book_corpus_wiki_en_cased|wiki_multilingual|wiki_multilingual_cased|wiki_cn| +|---|---|---|---|---|---| +|bert_12_768_12|✓|✓|✓|✓|✓| +|bert_24_1024_16|x|✓|x|x|x| + +Example of using the large pre-trained BERT model from Google + +```python +from bert_embedding import BertEmbedding + +bert_embedding = BertEmbedding(model='bert_24_1024_16', dataset_name='book_corpus_wiki_en_cased') +``` + +Source: [gluonnlp](http://gluon-nlp.mxnet.io/model_zoo/bert/index.html) + + + +%prep +%autosetup -n bert-embedding-1.0.1 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-bert-embedding -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 1.0.1-1 +- Package Spec generated @@ -0,0 +1 @@ +2b44a514265433c45251ded5849bee0d bert_embedding-1.0.1.tar.gz |