automatic import of python-keras-bert

author: CoprDistGit <infra@openeuler.org> 2023-04-11 05:25:21 +0000
committer: CoprDistGit <infra@openeuler.org> 2023-04-11 05:25:21 +0000
commit: 3f968a24824c4efaf1966df7f00cadc87a011e6e (patch)
tree: 579a6716a918d18841e775570acb4d9b7e32a1c1 /python-keras-bert.spec
parent: 82bd133077a33f0f89abcc3d6ec30840d2c6adb4 (diff)
1 files changed, 744 insertions, 0 deletions
diff --git a/python-keras-bert.spec b/python-keras-bert.spec
new file mode 100644
index 0000000..2eee419
--- /dev/null
+++ b/python-keras-bert.spec
@@ -0,0 +1,744 @@
+%global _empty_manifest_terminate_build 0
+Name:		python-keras-bert
+Version:	0.89.0
+Release:	1
+Summary:	BERT implemented in Keras
+License:	MIT
+URL:		https://github.com/CyberZHG/keras-bert
+Source0:	https://mirrors.nju.edu.cn/pypi/web/packages/74/0a/ffc65dfa4b31942ee8348e0026d2a7ee57e1769e9266c677141a3e2cac9c/keras-bert-0.89.0.tar.gz
+BuildArch:	noarch
+
+
+%description
+# Keras BERT
+
+[![Version](https://img.shields.io/pypi/v/keras-bert.svg)](https://pypi.org/project/keras-bert/)
+![License](https://img.shields.io/pypi/l/keras-bert.svg)
+
+\[[中文](https://github.com/CyberZHG/keras-bert/blob/master/README.zh-CN.md)|[English](https://github.com/CyberZHG/keras-bert/blob/master/README.md)\]
+
+Implementation of the [BERT](https://arxiv.org/pdf/1810.04805.pdf). Official pre-trained models could be loaded for feature extraction and prediction.
+
+## Install
+
+```bash
+pip install keras-bert
+```
+
+## Usage
+
+* [Load Official Pre-trained Models](#Load-Official-Pre-trained-Models)
+* [Tokenizer](#Tokenizer)
+* [Train & Use](#Train-&-Use)
+* [Use Warmup](#Use-Warmup)
+* [Download Pretrained Checkpoints](#Download-Pretrained-Checkpoints)
+* [Extract Features](#Extract-Features)
+
+### External Links
+
+* [Kashgari is a Production-ready NLP Transfer learning framework for text-labeling and text-classification](https://github.com/BrikerMan/Kashgari)
+* [Keras ALBERT](https://github.com/TinkerMob/keras_albert_model)
+
+### Load Official Pre-trained Models
+
+In [feature extraction demo](./demo/load_model/load_and_extract.py), you should be able to get the same extraction results as the official model `chinese_L-12_H-768_A-12`. And in [prediction demo](./demo/load_model/load_and_predict.py), the missing word in the sentence could be predicted.
+
+
+### Run on TPU
+
+The [extraction demo](https://colab.research.google.com/github/CyberZHG/keras-bert/blob/master/demo/load_model/keras_bert_load_and_extract_tpu.ipynb) shows how to convert to a model that runs on TPU.
+
+The [classification demo](https://colab.research.google.com/github/CyberZHG/keras-bert/blob/master/demo/tune/keras_bert_classification_tpu.ipynb) shows how to apply the model to simple classification tasks.
+
+### Tokenizer
+
+The `Tokenizer` class is used for splitting texts and generating indices:
+
+```python
+from keras_bert import Tokenizer
+
+token_dict = {
+    '[CLS]': 0,
+    '[SEP]': 1,
+    'un': 2,
+    '##aff': 3,
+    '##able': 4,
+    '[UNK]': 5,
+}
+tokenizer = Tokenizer(token_dict)
+print(tokenizer.tokenize('unaffable'))  # The result should be `['[CLS]', 'un', '##aff', '##able', '[SEP]']`
+indices, segments = tokenizer.encode('unaffable')
+print(indices)  # Should be `[0, 2, 3, 4, 1]`
+print(segments)  # Should be `[0, 0, 0, 0, 0]`
+
+print(tokenizer.tokenize(first='unaffable', second='钢'))
+# The result should be `['[CLS]', 'un', '##aff', '##able', '[SEP]', '钢', '[SEP]']`
+indices, segments = tokenizer.encode(first='unaffable', second='钢', max_len=10)
+print(indices)  # Should be `[0, 2, 3, 4, 1, 5, 1, 0, 0, 0]`
+print(segments)  # Should be `[0, 0, 0, 0, 0, 1, 1, 0, 0, 0]`
+```
+
+### Train & Use
+
+```python
+from tensorflow import keras
+from keras_bert import get_base_dict, get_model, compile_model, gen_batch_inputs
+
+
+# A toy input example
+sentence_pairs = [
+    [['all', 'work', 'and', 'no', 'play'], ['makes', 'jack', 'a', 'dull', 'boy']],
+    [['from', 'the', 'day', 'forth'], ['my', 'arm', 'changed']],
+    [['and', 'a', 'voice', 'echoed'], ['power', 'give', 'me', 'more', 'power']],
+]
+
+
+# Build token dictionary
+token_dict = get_base_dict()  # A dict that contains some special tokens
+for pairs in sentence_pairs:
+    for token in pairs[0] + pairs[1]:
+        if token not in token_dict:
+            token_dict[token] = len(token_dict)
+token_list = list(token_dict.keys())  # Used for selecting a random word
+
+
+# Build & train the model
+model = get_model(
+    token_num=len(token_dict),
+    head_num=5,
+    transformer_num=12,
+    embed_dim=25,
+    feed_forward_dim=100,
+    seq_len=20,
+    pos_num=20,
+    dropout_rate=0.05,
+)
+compile_model(model)
+model.summary()
+
+def _generator():
+    while True:
+        yield gen_batch_inputs(
+            sentence_pairs,
+            token_dict,
+            token_list,
+            seq_len=20,
+            mask_rate=0.3,
+            swap_sentence_rate=1.0,
+        )
+
+model.fit_generator(
+    generator=_generator(),
+    steps_per_epoch=1000,
+    epochs=100,
+    validation_data=_generator(),
+    validation_steps=100,
+    callbacks=[
+        keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)
+    ],
+)
+
+
+# Use the trained model
+inputs, output_layer = get_model(
+    token_num=len(token_dict),
+    head_num=5,
+    transformer_num=12,
+    embed_dim=25,
+    feed_forward_dim=100,
+    seq_len=20,
+    pos_num=20,
+    dropout_rate=0.05,
+    training=False,      # The input layers and output layer will be returned if `training` is `False`
+    trainable=False,     # Whether the model is trainable. The default value is the same with `training`
+    output_layer_num=4,  # The number of layers whose outputs will be concatenated as a single output.
+                         # Only available when `training` is `False`.
+)
+```
+
+### Use Warmup
+
+`AdamWarmup` optimizer is provided for warmup and decay. The learning rate will reach `lr` in `warmpup_steps` steps, and decay to `min_lr` in `decay_steps` steps. There is a helper function `calc_train_steps` for calculating the two steps:
+
+```python
+import numpy as np
+from keras_bert import AdamWarmup, calc_train_steps
+
+train_x = np.random.standard_normal((1024, 100))
+
+total_steps, warmup_steps = calc_train_steps(
+    num_example=train_x.shape[0],
+    batch_size=32,
+    epochs=10,
+    warmup_proportion=0.1,
+)
+
+optimizer = AdamWarmup(total_steps, warmup_steps, lr=1e-3, min_lr=1e-5)
+```
+
+### Download Pretrained Checkpoints
+
+Several download urls has been added. You can get the downloaded and uncompressed path of a checkpoint by:
+
+```python
+from keras_bert import get_pretrained, PretrainedList, get_checkpoint_paths
+
+model_path = get_pretrained(PretrainedList.multi_cased_base)
+paths = get_checkpoint_paths(model_path)
+print(paths.config, paths.checkpoint, paths.vocab)
+```
+
+### Extract Features
+
+You can use helper function `extract_embeddings` if the features of tokens or sentences (without further tuning) are what you need. To extract the features of all tokens:
+
+```python
+from keras_bert import extract_embeddings
+
+model_path = 'xxx/yyy/uncased_L-12_H-768_A-12'
+texts = ['all work and no play', 'makes jack a dull boy~']
+
+embeddings = extract_embeddings(model_path, texts)
+```
+
+The returned result is a list with the same length as texts. Each item in the list is a numpy array truncated by the length of the input. The shapes of outputs in this example are `(7, 768)` and `(8, 768)`.
+
+When the inputs are paired-sentences, and you need the outputs of `NSP` and max-pooling of the last 4 layers:
+
+```python
+from keras_bert import extract_embeddings, POOL_NSP, POOL_MAX
+
+model_path = 'xxx/yyy/uncased_L-12_H-768_A-12'
+texts = [
+    ('all work and no play', 'makes jack a dull boy'),
+    ('makes jack a dull boy', 'all work and no play'),
+]
+
+embeddings = extract_embeddings(model_path, texts, output_layer_num=4, poolings=[POOL_NSP, POOL_MAX])
+```
+
+There are no token features in the results. The outputs of `NSP` and max-pooling will be concatenated with the final shape `(768 x 4 x 2,)`.
+
+The second argument in the helper function is a generator. To extract features from file:
+
+```python
+import codecs
+from keras_bert import extract_embeddings
+
+model_path = 'xxx/yyy/uncased_L-12_H-768_A-12'
+
+with codecs.open('xxx.txt', 'r', 'utf8') as reader:
+    texts = map(lambda x: x.strip(), reader)
+    embeddings = extract_embeddings(model_path, texts)
+```
+
+### Use `tensorflow.python.keras`
+
+Add `TF_KERAS=1` to environment variables to use `tensorflow.python.keras`.
+
+%package -n python3-keras-bert
+Summary:	BERT implemented in Keras
+Provides:	python-keras-bert
+BuildRequires:	python3-devel
+BuildRequires:	python3-setuptools
+BuildRequires:	python3-pip
+%description -n python3-keras-bert
+# Keras BERT
+
+[![Version](https://img.shields.io/pypi/v/keras-bert.svg)](https://pypi.org/project/keras-bert/)
+![License](https://img.shields.io/pypi/l/keras-bert.svg)
+
+\[[中文](https://github.com/CyberZHG/keras-bert/blob/master/README.zh-CN.md)|[English](https://github.com/CyberZHG/keras-bert/blob/master/README.md)\]
+
+Implementation of the [BERT](https://arxiv.org/pdf/1810.04805.pdf). Official pre-trained models could be loaded for feature extraction and prediction.
+
+## Install
+
+```bash
+pip install keras-bert
+```
+
+## Usage
+
+* [Load Official Pre-trained Models](#Load-Official-Pre-trained-Models)
+* [Tokenizer](#Tokenizer)
+* [Train & Use](#Train-&-Use)
+* [Use Warmup](#Use-Warmup)
+* [Download Pretrained Checkpoints](#Download-Pretrained-Checkpoints)
+* [Extract Features](#Extract-Features)
+
+### External Links
+
+* [Kashgari is a Production-ready NLP Transfer learning framework for text-labeling and text-classification](https://github.com/BrikerMan/Kashgari)
+* [Keras ALBERT](https://github.com/TinkerMob/keras_albert_model)
+
+### Load Official Pre-trained Models
+
+In [feature extraction demo](./demo/load_model/load_and_extract.py), you should be able to get the same extraction results as the official model `chinese_L-12_H-768_A-12`. And in [prediction demo](./demo/load_model/load_and_predict.py), the missing word in the sentence could be predicted.
+
+
+### Run on TPU
+
+The [extraction demo](https://colab.research.google.com/github/CyberZHG/keras-bert/blob/master/demo/load_model/keras_bert_load_and_extract_tpu.ipynb) shows how to convert to a model that runs on TPU.
+
+The [classification demo](https://colab.research.google.com/github/CyberZHG/keras-bert/blob/master/demo/tune/keras_bert_classification_tpu.ipynb) shows how to apply the model to simple classification tasks.
+
+### Tokenizer
+
+The `Tokenizer` class is used for splitting texts and generating indices:
+
+```python
+from keras_bert import Tokenizer
+
+token_dict = {
+    '[CLS]': 0,
+    '[SEP]': 1,
+    'un': 2,
+    '##aff': 3,
+    '##able': 4,
+    '[UNK]': 5,
+}
+tokenizer = Tokenizer(token_dict)
+print(tokenizer.tokenize('unaffable'))  # The result should be `['[CLS]', 'un', '##aff', '##able', '[SEP]']`
+indices, segments = tokenizer.encode('unaffable')
+print(indices)  # Should be `[0, 2, 3, 4, 1]`
+print(segments)  # Should be `[0, 0, 0, 0, 0]`
+
+print(tokenizer.tokenize(first='unaffable', second='钢'))
+# The result should be `['[CLS]', 'un', '##aff', '##able', '[SEP]', '钢', '[SEP]']`
+indices, segments = tokenizer.encode(first='unaffable', second='钢', max_len=10)
+print(indices)  # Should be `[0, 2, 3, 4, 1, 5, 1, 0, 0, 0]`
+print(segments)  # Should be `[0, 0, 0, 0, 0, 1, 1, 0, 0, 0]`
+```
+
+### Train & Use
+
+```python
+from tensorflow import keras
+from keras_bert import get_base_dict, get_model, compile_model, gen_batch_inputs
+
+
+# A toy input example
+sentence_pairs = [
+    [['all', 'work', 'and', 'no', 'play'], ['makes', 'jack', 'a', 'dull', 'boy']],
+    [['from', 'the', 'day', 'forth'], ['my', 'arm', 'changed']],
+    [['and', 'a', 'voice', 'echoed'], ['power', 'give', 'me', 'more', 'power']],
+]
+
+
+# Build token dictionary
+token_dict = get_base_dict()  # A dict that contains some special tokens
+for pairs in sentence_pairs:
+    for token in pairs[0] + pairs[1]:
+        if token not in token_dict:
+            token_dict[token] = len(token_dict)
+token_list = list(token_dict.keys())  # Used for selecting a random word
+
+
+# Build & train the model
+model = get_model(
+    token_num=len(token_dict),
+    head_num=5,
+    transformer_num=12,
+    embed_dim=25,
+    feed_forward_dim=100,
+    seq_len=20,
+    pos_num=20,
+    dropout_rate=0.05,
+)
+compile_model(model)
+model.summary()
+
+def _generator():
+    while True:
+        yield gen_batch_inputs(
+            sentence_pairs,
+            token_dict,
+            token_list,
+            seq_len=20,
+            mask_rate=0.3,
+            swap_sentence_rate=1.0,
+        )
+
+model.fit_generator(
+    generator=_generator(),
+    steps_per_epoch=1000,
+    epochs=100,
+    validation_data=_generator(),
+    validation_steps=100,
+    callbacks=[
+        keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)
+    ],
+)
+
+
+# Use the trained model
+inputs, output_layer = get_model(
+    token_num=len(token_dict),
+    head_num=5,
+    transformer_num=12,
+    embed_dim=25,
+    feed_forward_dim=100,
+    seq_len=20,
+    pos_num=20,
+    dropout_rate=0.05,
+    training=False,      # The input layers and output layer will be returned if `training` is `False`
+    trainable=False,     # Whether the model is trainable. The default value is the same with `training`
+    output_layer_num=4,  # The number of layers whose outputs will be concatenated as a single output.
+                         # Only available when `training` is `False`.
+)
+```
+
+### Use Warmup
+
+`AdamWarmup` optimizer is provided for warmup and decay. The learning rate will reach `lr` in `warmpup_steps` steps, and decay to `min_lr` in `decay_steps` steps. There is a helper function `calc_train_steps` for calculating the two steps:
+
+```python
+import numpy as np
+from keras_bert import AdamWarmup, calc_train_steps
+
+train_x = np.random.standard_normal((1024, 100))
+
+total_steps, warmup_steps = calc_train_steps(
+    num_example=train_x.shape[0],
+    batch_size=32,
+    epochs=10,
+    warmup_proportion=0.1,
+)
+
+optimizer = AdamWarmup(total_steps, warmup_steps, lr=1e-3, min_lr=1e-5)
+```
+
+### Download Pretrained Checkpoints
+
+Several download urls has been added. You can get the downloaded and uncompressed path of a checkpoint by:
+
+```python
+from keras_bert import get_pretrained, PretrainedList, get_checkpoint_paths
+
+model_path = get_pretrained(PretrainedList.multi_cased_base)
+paths = get_checkpoint_paths(model_path)
+print(paths.config, paths.checkpoint, paths.vocab)
+```
+
+### Extract Features
+
+You can use helper function `extract_embeddings` if the features of tokens or sentences (without further tuning) are what you need. To extract the features of all tokens:
+
+```python
+from keras_bert import extract_embeddings
+
+model_path = 'xxx/yyy/uncased_L-12_H-768_A-12'
+texts = ['all work and no play', 'makes jack a dull boy~']
+
+embeddings = extract_embeddings(model_path, texts)
+```
+
+The returned result is a list with the same length as texts. Each item in the list is a numpy array truncated by the length of the input. The shapes of outputs in this example are `(7, 768)` and `(8, 768)`.
+
+When the inputs are paired-sentences, and you need the outputs of `NSP` and max-pooling of the last 4 layers:
+
+```python
+from keras_bert import extract_embeddings, POOL_NSP, POOL_MAX
+
+model_path = 'xxx/yyy/uncased_L-12_H-768_A-12'
+texts = [
+    ('all work and no play', 'makes jack a dull boy'),
+    ('makes jack a dull boy', 'all work and no play'),
+]
+
+embeddings = extract_embeddings(model_path, texts, output_layer_num=4, poolings=[POOL_NSP, POOL_MAX])
+```
+
+There are no token features in the results. The outputs of `NSP` and max-pooling will be concatenated with the final shape `(768 x 4 x 2,)`.
+
+The second argument in the helper function is a generator. To extract features from file:
+
+```python
+import codecs
+from keras_bert import extract_embeddings
+
+model_path = 'xxx/yyy/uncased_L-12_H-768_A-12'
+
+with codecs.open('xxx.txt', 'r', 'utf8') as reader:
+    texts = map(lambda x: x.strip(), reader)
+    embeddings = extract_embeddings(model_path, texts)
+```
+
+### Use `tensorflow.python.keras`
+
+Add `TF_KERAS=1` to environment variables to use `tensorflow.python.keras`.
+
+%package help
+Summary:	Development documents and examples for keras-bert
+Provides:	python3-keras-bert-doc
+%description help
+# Keras BERT
+
+[![Version](https://img.shields.io/pypi/v/keras-bert.svg)](https://pypi.org/project/keras-bert/)
+![License](https://img.shields.io/pypi/l/keras-bert.svg)
+
+\[[中文](https://github.com/CyberZHG/keras-bert/blob/master/README.zh-CN.md)|[English](https://github.com/CyberZHG/keras-bert/blob/master/README.md)\]
+
+Implementation of the [BERT](https://arxiv.org/pdf/1810.04805.pdf). Official pre-trained models could be loaded for feature extraction and prediction.
+
+## Install
+
+```bash
+pip install keras-bert
+```
+
+## Usage
+
+* [Load Official Pre-trained Models](#Load-Official-Pre-trained-Models)
+* [Tokenizer](#Tokenizer)
+* [Train & Use](#Train-&-Use)
+* [Use Warmup](#Use-Warmup)
+* [Download Pretrained Checkpoints](#Download-Pretrained-Checkpoints)
+* [Extract Features](#Extract-Features)
+
+### External Links
+
+* [Kashgari is a Production-ready NLP Transfer learning framework for text-labeling and text-classification](https://github.com/BrikerMan/Kashgari)
+* [Keras ALBERT](https://github.com/TinkerMob/keras_albert_model)
+
+### Load Official Pre-trained Models
+
+In [feature extraction demo](./demo/load_model/load_and_extract.py), you should be able to get the same extraction results as the official model `chinese_L-12_H-768_A-12`. And in [prediction demo](./demo/load_model/load_and_predict.py), the missing word in the sentence could be predicted.
+
+
+### Run on TPU
+
+The [extraction demo](https://colab.research.google.com/github/CyberZHG/keras-bert/blob/master/demo/load_model/keras_bert_load_and_extract_tpu.ipynb) shows how to convert to a model that runs on TPU.
+
+The [classification demo](https://colab.research.google.com/github/CyberZHG/keras-bert/blob/master/demo/tune/keras_bert_classification_tpu.ipynb) shows how to apply the model to simple classification tasks.
+
+### Tokenizer
+
+The `Tokenizer` class is used for splitting texts and generating indices:
+
+```python
+from keras_bert import Tokenizer
+
+token_dict = {
+    '[CLS]': 0,
+    '[SEP]': 1,
+    'un': 2,
+    '##aff': 3,
+    '##able': 4,
+    '[UNK]': 5,
+}
+tokenizer = Tokenizer(token_dict)
+print(tokenizer.tokenize('unaffable'))  # The result should be `['[CLS]', 'un', '##aff', '##able', '[SEP]']`
+indices, segments = tokenizer.encode('unaffable')
+print(indices)  # Should be `[0, 2, 3, 4, 1]`
+print(segments)  # Should be `[0, 0, 0, 0, 0]`
+
+print(tokenizer.tokenize(first='unaffable', second='钢'))
+# The result should be `['[CLS]', 'un', '##aff', '##able', '[SEP]', '钢', '[SEP]']`
+indices, segments = tokenizer.encode(first='unaffable', second='钢', max_len=10)
+print(indices)  # Should be `[0, 2, 3, 4, 1, 5, 1, 0, 0, 0]`
+print(segments)  # Should be `[0, 0, 0, 0, 0, 1, 1, 0, 0, 0]`
+```
+
+### Train & Use
+
+```python
+from tensorflow import keras
+from keras_bert import get_base_dict, get_model, compile_model, gen_batch_inputs
+
+
+# A toy input example
+sentence_pairs = [
+    [['all', 'work', 'and', 'no', 'play'], ['makes', 'jack', 'a', 'dull', 'boy']],
+    [['from', 'the', 'day', 'forth'], ['my', 'arm', 'changed']],
+    [['and', 'a', 'voice', 'echoed'], ['power', 'give', 'me', 'more', 'power']],
+]
+
+
+# Build token dictionary
+token_dict = get_base_dict()  # A dict that contains some special tokens
+for pairs in sentence_pairs:
+    for token in pairs[0] + pairs[1]:
+        if token not in token_dict:
+            token_dict[token] = len(token_dict)
+token_list = list(token_dict.keys())  # Used for selecting a random word
+
+
+# Build & train the model
+model = get_model(
+    token_num=len(token_dict),
+    head_num=5,
+    transformer_num=12,
+    embed_dim=25,
+    feed_forward_dim=100,
+    seq_len=20,
+    pos_num=20,
+    dropout_rate=0.05,
+)
+compile_model(model)
+model.summary()
+
+def _generator():
+    while True:
+        yield gen_batch_inputs(
+            sentence_pairs,
+            token_dict,
+            token_list,
+            seq_len=20,
+            mask_rate=0.3,
+            swap_sentence_rate=1.0,
+        )
+
+model.fit_generator(
+    generator=_generator(),
+    steps_per_epoch=1000,
+    epochs=100,
+    validation_data=_generator(),
+    validation_steps=100,
+    callbacks=[
+        keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)
+    ],
+)
+
+
+# Use the trained model
+inputs, output_layer = get_model(
+    token_num=len(token_dict),
+    head_num=5,
+    transformer_num=12,
+    embed_dim=25,
+    feed_forward_dim=100,
+    seq_len=20,
+    pos_num=20,
+    dropout_rate=0.05,
+    training=False,      # The input layers and output layer will be returned if `training` is `False`
+    trainable=False,     # Whether the model is trainable. The default value is the same with `training`
+    output_layer_num=4,  # The number of layers whose outputs will be concatenated as a single output.
+                         # Only available when `training` is `False`.
+)
+```
+
+### Use Warmup
+
+`AdamWarmup` optimizer is provided for warmup and decay. The learning rate will reach `lr` in `warmpup_steps` steps, and decay to `min_lr` in `decay_steps` steps. There is a helper function `calc_train_steps` for calculating the two steps:
+
+```python
+import numpy as np
+from keras_bert import AdamWarmup, calc_train_steps
+
+train_x = np.random.standard_normal((1024, 100))
+
+total_steps, warmup_steps = calc_train_steps(
+    num_example=train_x.shape[0],
+    batch_size=32,
+    epochs=10,
+    warmup_proportion=0.1,
+)
+
+optimizer = AdamWarmup(total_steps, warmup_steps, lr=1e-3, min_lr=1e-5)
+```
+
+### Download Pretrained Checkpoints
+
+Several download urls has been added. You can get the downloaded and uncompressed path of a checkpoint by:
+
+```python
+from keras_bert import get_pretrained, PretrainedList, get_checkpoint_paths
+
+model_path = get_pretrained(PretrainedList.multi_cased_base)
+paths = get_checkpoint_paths(model_path)
+print(paths.config, paths.checkpoint, paths.vocab)
+```
+
+### Extract Features
+
+You can use helper function `extract_embeddings` if the features of tokens or sentences (without further tuning) are what you need. To extract the features of all tokens:
+
+```python
+from keras_bert import extract_embeddings
+
+model_path = 'xxx/yyy/uncased_L-12_H-768_A-12'
+texts = ['all work and no play', 'makes jack a dull boy~']
+
+embeddings = extract_embeddings(model_path, texts)
+```
+
+The returned result is a list with the same length as texts. Each item in the list is a numpy array truncated by the length of the input. The shapes of outputs in this example are `(7, 768)` and `(8, 768)`.
+
+When the inputs are paired-sentences, and you need the outputs of `NSP` and max-pooling of the last 4 layers:
+
+```python
+from keras_bert import extract_embeddings, POOL_NSP, POOL_MAX
+
+model_path = 'xxx/yyy/uncased_L-12_H-768_A-12'
+texts = [
+    ('all work and no play', 'makes jack a dull boy'),
+    ('makes jack a dull boy', 'all work and no play'),
+]
+
+embeddings = extract_embeddings(model_path, texts, output_layer_num=4, poolings=[POOL_NSP, POOL_MAX])
+```
+
+There are no token features in the results. The outputs of `NSP` and max-pooling will be concatenated with the final shape `(768 x 4 x 2,)`.
+
+The second argument in the helper function is a generator. To extract features from file:
+
+```python
+import codecs
+from keras_bert import extract_embeddings
+
+model_path = 'xxx/yyy/uncased_L-12_H-768_A-12'
+
+with codecs.open('xxx.txt', 'r', 'utf8') as reader:
+    texts = map(lambda x: x.strip(), reader)
+    embeddings = extract_embeddings(model_path, texts)
+```
+
+### Use `tensorflow.python.keras`
+
+Add `TF_KERAS=1` to environment variables to use `tensorflow.python.keras`.
+
+%prep
+%autosetup -n keras-bert-0.89.0
+
+%build
+%py3_build
+
+%install
+%py3_install
+install -d -m755 %{buildroot}/%{_pkgdocdir}
+if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi
+if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi
+if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi
+if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi
+pushd %{buildroot}
+if [ -d usr/lib ]; then
+	find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/lib64 ]; then
+	find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/bin ]; then
+	find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+if [ -d usr/sbin ]; then
+	find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst
+fi
+touch doclist.lst
+if [ -d usr/share/man ]; then
+	find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst
+fi
+popd
+mv %{buildroot}/filelist.lst .
+mv %{buildroot}/doclist.lst .
+
+%files -n python3-keras-bert -f filelist.lst
+%dir %{python3_sitelib}/*
+
+%files help -f doclist.lst
+%{_docdir}/*
+
+%changelog
+* Tue Apr 11 2023 Python_Bot <Python_Bot@openeuler.org> - 0.89.0-1
+- Package Spec generated
author	CoprDistGit <infra@openeuler.org>	2023-04-11 05:25:21 +0000
committer	CoprDistGit <infra@openeuler.org>	2023-04-11 05:25:21 +0000
commit	3f968a24824c4efaf1966df7f00cadc87a011e6e (patch)
tree	579a6716a918d18841e775570acb4d9b7e32a1c1 /python-keras-bert.spec
parent	82bd133077a33f0f89abcc3d6ec30840d2c6adb4 (diff)