summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCoprDistGit <infra@openeuler.org>2023-05-05 04:43:55 +0000
committerCoprDistGit <infra@openeuler.org>2023-05-05 04:43:55 +0000
commitd3d5da0104bb0199cc78cb042dfdd071ed736161 (patch)
tree4971aa44806d7b697aa54ff08ecfb854e013fb3f
parentd0a3d0e716d30329d270dc27569d6509fe20745c (diff)
automatic import of python-bert-embeddingopeneuler20.03
-rw-r--r--.gitignore1
-rw-r--r--python-bert-embedding.spec365
-rw-r--r--sources1
3 files changed, 367 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index e69de29..01872c0 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/bert_embedding-1.0.1.tar.gz
diff --git a/python-bert-embedding.spec b/python-bert-embedding.spec
new file mode 100644
index 0000000..36d8ada
--- /dev/null
+++ b/python-bert-embedding.spec
@@ -0,0 +1,365 @@
+%global _empty_manifest_terminate_build 0
+Name: python-bert-embedding
+Version: 1.0.1
+Release: 1
+Summary: BERT token level embedding with MxNet
+License: ALv2
+URL: https://github.com/imgarylai/bert_embedding
+Source0: https://mirrors.nju.edu.cn/pypi/web/packages/32/49/13f76cef121677994bb1b0e8baa8b8bf88405eb1be554925fe8682b7b71e/bert_embedding-1.0.1.tar.gz
+BuildArch: noarch
+
+Requires: python3-typing
+Requires: python3-numpy
+Requires: python3-mxnet
+Requires: python3-gluonnlp
+Requires: python3-mxnet-cu92
+
+%description
+# Bert Embeddings
+
+[![Build Status](https://travis-ci.org/imgarylai/bert-embedding.svg?branch=master)](https://travis-ci.org/imgarylai/bert-embedding) [![codecov](https://codecov.io/gh/imgarylai/bert-embedding/branch/master/graph/badge.svg)](https://codecov.io/gh/imgarylai/bert-embedding) [![PyPI version](https://badge.fury.io/py/bert-embedding.svg)](https://pypi.org/project/bert-embedding/) [![Documentation Status](https://readthedocs.org/projects/bert-embedding/badge/?version=latest)](https://bert-embedding.readthedocs.io/en/latest/?badge=latest)
+
+
+[BERT](https://arxiv.org/abs/1810.04805), published by [Google](https://github.com/google-research/bert), is new way to obtain pre-trained language model word representation. Many NLP tasks are benefit from BERT to get the SOTA.
+
+The goal of this project is to obtain the token embedding from BERT's pre-trained model. In this way, instead of building and do fine-tuning for an end-to-end NLP model, you can build your model by just utilizing or token embedding.
+
+This project is implemented with [@MXNet](https://github.com/apache/incubator-mxnet). Special thanks to [@gluon-nlp](https://github.com/dmlc/gluon-nlp) team.
+
+## Install
+
+```
+pip install bert-embedding
+# If you want to run on GPU machine, please install `mxnet-cu92`.
+pip install mxnet-cu92
+```
+
+## Usage
+
+```python
+from bert_embedding import BertEmbedding
+
+bert_abstract = """We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.
+ Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers.
+ As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.
+BERT is conceptually simple and empirically powerful.
+It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%."""
+sentences = bert_abstract.split('\n')
+bert_embedding = BertEmbedding()
+result = bert_embedding(sentences)
+```
+If you want to use GPU, please import mxnet and set context
+
+```python
+import mxnet as mx
+from bert_embedding import BertEmbedding
+
+...
+
+ctx = mx.gpu(0)
+bert = BertEmbedding(ctx=ctx)
+```
+
+This result is a list of a tuple containing (tokens, tokens embedding)
+
+For example:
+
+```python
+first_sentence = result[0]
+
+first_sentence[0]
+# ['we', 'introduce', 'a', 'new', 'language', 'representation', 'model', 'called', 'bert', ',', 'which', 'stands', 'for', 'bidirectional', 'encoder', 'representations', 'from', 'transformers']
+len(first_sentence[0])
+# 18
+
+
+len(first_sentence[1])
+# 18
+first_token_in_first_sentence = first_sentence[1]
+first_token_in_first_sentence[1]
+# array([ 0.4805648 , 0.18369392, -0.28554988, ..., -0.01961522,
+# 1.0207764 , -0.67167974], dtype=float32)
+first_token_in_first_sentence[1].shape
+# (768,)
+```
+
+## OOV
+
+There are three ways to handle oov, avg (default), sum, and last. This can be specified in encoding.
+
+```python
+...
+bert_embedding = BertEmbedding()
+bert_embedding(sentences, 'sum')
+...
+```
+
+## Available pre-trained BERT models
+
+| |book_corpus_wiki_en_uncased|book_corpus_wiki_en_cased|wiki_multilingual|wiki_multilingual_cased|wiki_cn|
+|---|---|---|---|---|---|
+|bert_12_768_12|✓|✓|✓|✓|✓|
+|bert_24_1024_16|x|✓|x|x|x|
+
+Example of using the large pre-trained BERT model from Google
+
+```python
+from bert_embedding import BertEmbedding
+
+bert_embedding = BertEmbedding(model='bert_24_1024_16', dataset_name='book_corpus_wiki_en_cased')
+```
+
+Source: [gluonnlp](http://gluon-nlp.mxnet.io/model_zoo/bert/index.html)
+
+
+
+%package -n python3-bert-embedding
+Summary: BERT token level embedding with MxNet
+Provides: python-bert-embedding
+BuildRequires: python3-devel
+BuildRequires: python3-setuptools
+BuildRequires: python3-pip
+%description -n python3-bert-embedding
+# Bert Embeddings
+
+[![Build Status](https://travis-ci.org/imgarylai/bert-embedding.svg?branch=master)](https://travis-ci.org/imgarylai/bert-embedding) [![codecov](https://codecov.io/gh/imgarylai/bert-embedding/branch/master/graph/badge.svg)](https://codecov.io/gh/imgarylai/bert-embedding) [![PyPI version](https://badge.fury.io/py/bert-embedding.svg)](https://pypi.org/project/bert-embedding/) [![Documentation Status](https://readthedocs.org/projects/bert-embedding/badge/?version=latest)](https://bert-embedding.readthedocs.io/en/latest/?badge=latest)
+
+
+[BERT](https://arxiv.org/abs/1810.04805), published by [Google](https://github.com/google-research/bert), is new way to obtain pre-trained language model word representation. Many NLP tasks are benefit from BERT to get the SOTA.
+
+The goal of this project is to obtain the token embedding from BERT's pre-trained model. In this way, instead of building and do fine-tuning for an end-to-end NLP model, you can build your model by just utilizing or token embedding.
+
+This project is implemented with [@MXNet](https://github.com/apache/incubator-mxnet). Special thanks to [@gluon-nlp](https://github.com/dmlc/gluon-nlp) team.
+
+## Install
+
+```
+pip install bert-embedding
+# If you want to run on GPU machine, please install `mxnet-cu92`.
+pip install mxnet-cu92
+```
+
+## Usage
+
+```python
+from bert_embedding import BertEmbedding
+
+bert_abstract = """We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.
+ Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers.
+ As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.
+BERT is conceptually simple and empirically powerful.
+It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%."""
+sentences = bert_abstract.split('\n')
+bert_embedding = BertEmbedding()
+result = bert_embedding(sentences)
+```
+If you want to use GPU, please import mxnet and set context
+
+```python
+import mxnet as mx
+from bert_embedding import BertEmbedding
+
+...
+
+ctx = mx.gpu(0)
+bert = BertEmbedding(ctx=ctx)
+```
+
+This result is a list of a tuple containing (tokens, tokens embedding)
+
+For example:
+
+```python
+first_sentence = result[0]
+
+first_sentence[0]
+# ['we', 'introduce', 'a', 'new', 'language', 'representation', 'model', 'called', 'bert', ',', 'which', 'stands', 'for', 'bidirectional', 'encoder', 'representations', 'from', 'transformers']
+len(first_sentence[0])
+# 18
+
+
+len(first_sentence[1])
+# 18
+first_token_in_first_sentence = first_sentence[1]
+first_token_in_first_sentence[1]
+# array([ 0.4805648 , 0.18369392, -0.28554988, ..., -0.01961522,
+# 1.0207764 , -0.67167974], dtype=float32)
+first_token_in_first_sentence[1].shape
+# (768,)
+```
+
+## OOV
+
+There are three ways to handle oov, avg (default), sum, and last. This can be specified in encoding.
+
+```python
+...
+bert_embedding = BertEmbedding()
+bert_embedding(sentences, 'sum')
+...
+```
+
+## Available pre-trained BERT models
+
+| |book_corpus_wiki_en_uncased|book_corpus_wiki_en_cased|wiki_multilingual|wiki_multilingual_cased|wiki_cn|
+|---|---|---|---|---|---|
+|bert_12_768_12|✓|✓|✓|✓|✓|
+|bert_24_1024_16|x|✓|x|x|x|
+
+Example of using the large pre-trained BERT model from Google
+
+```python
+from bert_embedding import BertEmbedding
+
+bert_embedding = BertEmbedding(model='bert_24_1024_16', dataset_name='book_corpus_wiki_en_cased')
+```
+
+Source: [gluonnlp](http://gluon-nlp.mxnet.io/model_zoo/bert/index.html)
+
+
+
+%package help
+Summary: Development documents and examples for bert-embedding
+Provides: python3-bert-embedding-doc
+%description help
+# Bert Embeddings
+
+[![Build Status](https://travis-ci.org/imgarylai/bert-embedding.svg?branch=master)](https://travis-ci.org/imgarylai/bert-embedding) [![codecov](https://codecov.io/gh/imgarylai/bert-embedding/branch/master/graph/badge.svg)](https://codecov.io/gh/imgarylai/bert-embedding) [![PyPI version](https://badge.fury.io/py/bert-embedding.svg)](https://pypi.org/project/bert-embedding/) [![Documentation Status](https://readthedocs.org/projects/bert-embedding/badge/?version=latest)](https://bert-embedding.readthedocs.io/en/latest/?badge=latest)
+
+
+[BERT](https://arxiv.org/abs/1810.04805), published by [Google](https://github.com/google-research/bert), is new way to obtain pre-trained language model word representation. Many NLP tasks are benefit from BERT to get the SOTA.
+
+The goal of this project is to obtain the token embedding from BERT's pre-trained model. In this way, instead of building and do fine-tuning for an end-to-end NLP model, you can build your model by just utilizing or token embedding.
+
+This project is implemented with [@MXNet](https://github.com/apache/incubator-mxnet). Special thanks to [@gluon-nlp](https://github.com/dmlc/gluon-nlp) team.
+
+## Install
+
+```
+pip install bert-embedding
+# If you want to run on GPU machine, please install `mxnet-cu92`.
+pip install mxnet-cu92
+```
+
+## Usage
+
+```python
+from bert_embedding import BertEmbedding
+
+bert_abstract = """We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.
+ Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers.
+ As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.
+BERT is conceptually simple and empirically powerful.
+It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%."""
+sentences = bert_abstract.split('\n')
+bert_embedding = BertEmbedding()
+result = bert_embedding(sentences)
+```
+If you want to use GPU, please import mxnet and set context
+
+```python
+import mxnet as mx
+from bert_embedding import BertEmbedding
+
+...
+
+ctx = mx.gpu(0)
+bert = BertEmbedding(ctx=ctx)
+```
+
+This result is a list of a tuple containing (tokens, tokens embedding)
+
+For example:
+
+```python
+first_sentence = result[0]
+
+first_sentence[0]
+# ['we', 'introduce', 'a', 'new', 'language', 'representation', 'model', 'called', 'bert', ',', 'which', 'stands', 'for', 'bidirectional', 'encoder', 'representations', 'from', 'transformers']
+len(first_sentence[0])
+# 18
+
+
+len(first_sentence[1])
+# 18
+first_token_in_first_sentence = first_sentence[1]
+first_token_in_first_sentence[1]
+# array([ 0.4805648 , 0.18369392, -0.28554988, ..., -0.01961522,
+# 1.0207764 , -0.67167974], dtype=float32)
+first_token_in_first_sentence[1].shape
+# (768,)
+```
+
+## OOV
+
+There are three ways to handle oov, avg (default), sum, and last. This can be specified in encoding.
+
+```python
+...
+bert_embedding = BertEmbedding()
+bert_embedding(sentences, 'sum')
+...
+```
+
+## Available pre-trained BERT models
+
+| |book_corpus_wiki_en_uncased|book_corpus_wiki_en_cased|wiki_multilingual|wiki_multilingual_cased|wiki_cn|
+|---|---|---|---|---|---|
+|bert_12_768_12|✓|✓|✓|✓|✓|
+|bert_24_1024_16|x|✓|x|x|x|
+
+Example of using the large pre-trained BERT model from Google
+
+```python
+from bert_embedding import BertEmbedding
+
+bert_embedding = BertEmbedding(model='bert_24_1024_16', dataset_name='book_corpus_wiki_en_cased')
+```
+
+Source: [gluonnlp](http://gluon-nlp.mxnet.io/model_zoo/bert/index.html)
+
+
+
+%prep
+%autosetup -n bert-embedding-1.0.1
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+ find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+ find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+ find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+ find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+ find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-bert-embedding -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 1.0.1-1
+- Package Spec generated
diff --git a/sources b/sources
new file mode 100644
index 0000000..51cbaa9
--- /dev/null
+++ b/sources
@@ -0,0 +1 @@
+2b44a514265433c45251ded5849bee0d bert_embedding-1.0.1.tar.gz