diff options
| -rw-r--r-- | .gitignore | 1 | ||||
| -rw-r--r-- | python-tensorflowasr.spec | 784 | ||||
| -rw-r--r-- | sources | 1 |
3 files changed, 786 insertions, 0 deletions
@@ -0,0 +1 @@ +/TensorFlowASR-1.0.3.tar.gz diff --git a/python-tensorflowasr.spec b/python-tensorflowasr.spec new file mode 100644 index 0000000..ed59cee --- /dev/null +++ b/python-tensorflowasr.spec @@ -0,0 +1,784 @@ +%global _empty_manifest_terminate_build 0 +Name: python-TensorFlowASR +Version: 1.0.3 +Release: 1 +Summary: Almost State-of-the-art Automatic Speech Recognition using Tensorflow 2 +License: Apache Software License +URL: https://github.com/TensorSpeech/TensorFlowASR +Source0: https://mirrors.nju.edu.cn/pypi/web/packages/89/e3/3bd44a5ebd93eb4c604ecfbbf603b7a5efd5d20ea595f216c4005045f2d5/TensorFlowASR-1.0.3.tar.gz +BuildArch: noarch + +Requires: python3-SoundFile +Requires: python3-tensorflow-datasets +Requires: python3-nltk +Requires: python3-numpy +Requires: python3-sentencepiece +Requires: python3-tqdm +Requires: python3-librosa +Requires: python3-PyYAML +Requires: python3-Pillow +Requires: python3-black +Requires: python3-flake8 +Requires: python3-sounddevice +Requires: python3-fire +Requires: python3-tensorflow +Requires: python3-tensorflow-text +Requires: python3-tensorflow-io +Requires: python3-tensorflow-gpu +Requires: python3-tensorflow-text +Requires: python3-tensorflow-io +Requires: python3-tensorflow +Requires: python3-tensorflow-text +Requires: python3-tensorflow-io +Requires: python3-tensorflow-gpu +Requires: python3-tensorflow-text +Requires: python3-tensorflow-io +Requires: python3-tensorflow +Requires: python3-tensorflow-text +Requires: python3-tensorflow-io +Requires: python3-tensorflow-gpu +Requires: python3-tensorflow-text +Requires: python3-tensorflow-io +Requires: python3-tensorflow +Requires: python3-tensorflow-text +Requires: python3-tensorflow-io +Requires: python3-tensorflow-gpu +Requires: python3-tensorflow-text +Requires: python3-tensorflow-io + +%description +<h1 align="center"> +<p>TensorFlowASR :zap:</p> +<p align="center"> +<a href="https://github.com/TensorSpeech/TensorFlowASR/blob/main/LICENSE"> + <img alt="GitHub" src="https://img.shields.io/github/license/TensorSpeech/TensorFlowASR?logo=apache&logoColor=green"> +</a> +<img alt="python" src="https://img.shields.io/badge/python-%3E%3D3.6-blue?logo=python"> +<img alt="tensorflow" src="https://img.shields.io/badge/tensorflow-%3E%3D2.5.1-orange?logo=tensorflow"> +<a href="https://pypi.org/project/TensorFlowASR/"> + <img alt="PyPI" src="https://img.shields.io/pypi/v/TensorFlowASR?color=%234285F4&label=release&logo=pypi&logoColor=%234285F4"> +</a> +</p> +</h1> +<h2 align="center"> +<p>Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2</p> +</h2> + +<p align="center"> +TensorFlowASR implements some automatic speech recognition architectures such as DeepSpeech2, Jasper, RNN Transducer, ContextNet, Conformer, etc. These models can be converted to TFLite to reduce memory and computation for deployment :smile: +</p> + +## What's New? + +- (04/17/2021) Refactor repository with new version 1.x +- (02/16/2021) Supported for TPU training +- (12/27/2020) Supported _naive_ token level timestamp, see [demo](./examples/demonstration/conformer.py) with flag `--timestamp` +- (12/17/2020) Supported ContextNet [http://arxiv.org/abs/2005.03191](http://arxiv.org/abs/2005.03191) +- (12/12/2020) Add support for using masking +- (11/14/2020) Supported Gradient Accumulation for Training in Larger Batch Size + +## Table of Contents + +<!-- TOC --> + +- [What's New?](#whats-new) +- [Table of Contents](#table-of-contents) +- [:yum: Supported Models](#yum-supported-models) + - [Baselines](#baselines) + - [Publications](#publications) +- [Installation](#installation) + - [Installing from source (recommended)](#installing-from-source-recommended) + - [Installing via PyPi](#installing-via-pypi) + - [Running in a container](#running-in-a-container) +- [Setup training and testing](#setup-training-and-testing) +- [TFLite Convertion](#tflite-convertion) +- [Features Extraction](#features-extraction) +- [Augmentations](#augmentations) +- [Training & Testing Tutorial](#training--testing-tutorial) +- [Corpus Sources and Pretrained Models](#corpus-sources-and-pretrained-models) + - [English](#english) + - [Vietnamese](#vietnamese) + - [German](#german) +- [References & Credits](#references--credits) +- [Contact](#contact) + +<!-- /TOC --> + +## :yum: Supported Models + +### Baselines + +- **Transducer Models** (End2end models using RNNT Loss for training, currently supported Conformer, ContextNet, Streaming Transducer) +- **CTCModel** (End2end models using CTC Loss for training, currently supported DeepSpeech2, Jasper) + +### Publications + +- **Conformer Transducer** (Reference: [https://arxiv.org/abs/2005.08100](https://arxiv.org/abs/2005.08100)) + See [examples/conformer](./examples/conformer) +- **Streaming Transducer** (Reference: [https://arxiv.org/abs/1811.06621](https://arxiv.org/abs/1811.06621)) + See [examples/streaming_transducer](./examples/streaming_transducer) +- **ContextNet** (Reference: [http://arxiv.org/abs/2005.03191](http://arxiv.org/abs/2005.03191)) + See [examples/contextnet](./examples/contextnet) +- **Deep Speech 2** (Reference: [https://arxiv.org/abs/1512.02595](https://arxiv.org/abs/1512.02595)) + See [examples/deepspeech2](./examples/deepspeech2) +- **Jasper** (Reference: [https://arxiv.org/abs/1904.03288](https://arxiv.org/abs/1904.03288)) + See [examples/jasper](./examples/jasper) + +## Installation + +For training and testing, you should use `git clone` for installing necessary packages from other authors (`ctc_decoders`, `rnnt_loss`, etc.) + +### Installing from source (recommended) + +```bash +git clone https://github.com/TensorSpeech/TensorFlowASR.git +cd TensorFlowASR +# Tensorflow 2.x (with 2.x.x >= 2.5.1) +pip3 install -e ".[tf2.x]" # or ".[tf2.x-gpu]" +``` + +For anaconda3: + +```bash +conda create -y -n tfasr tensorflow-gpu python=3.8 # tensorflow if using CPU, this makes sure conda install all dependencies for tensorflow +conda activate tfasr +pip install -U tensorflow-gpu # upgrade to latest version of tensorflow +git clone https://github.com/TensorSpeech/TensorFlowASR.git +cd TensorFlowASR +# Tensorflow 2.x (with 2.x.x >= 2.5.1) +pip3 install -e ".[tf2.x]" # or ".[tf2.x-gpu]" +``` + +### Installing via PyPi + +```bash +# Tensorflow 2.x (with 2.x >= 2.3) +pip3 install -U "TensorFlowASR[tf2.x]" # or pip3 install -U "TensorFlowASR[tf2.x-gpu]" +``` + + +### Running in a container + +```bash +docker-compose up -d +``` + +## Setup training and testing + +- For datasets, see [datasets](./tensorflow_asr/datasets/README.md) + +- For _training, testing and using_ **CTC Models**, run `./scripts/install_ctc_decoders.sh` + +- For _training_ **Transducer Models** with RNNT Loss in TF, make sure that [warp-transducer](https://github.com/HawkAaron/warp-transducer) **is not installed** (by simply run `pip3 uninstall warprnnt-tensorflow`) (**Recommended**) + +- For _training_ **Transducer Models** with RNNT Loss from [warp-transducer](https://github.com/HawkAaron/warp-transducer), run `export CUDA_HOME=/usr/local/cuda && ./scripts/install_rnnt_loss.sh` (**Note**: only `export CUDA_HOME` when you have CUDA) + +- For _mixed precision training_, use flag `--mxp` when running python scripts from [examples](./examples) + +- For _enabling XLA_, run `TF_XLA_FLAGS=--tf_xla_auto_jit=2 python3 $path_to_py_script`) + +- For _hiding warnings_, run `export TF_CPP_MIN_LOG_LEVEL=2` before running any examples + +## TFLite Convertion + +After converting to tflite, the tflite model is like a function that transforms directly from an **audio signal** to **unicode code points**, then we can convert unicode points to string. + +1. Install `tf-nightly` using `pip install tf-nightly` +2. Build a model with the same architecture as the trained model _(if model has tflite argument, you must set it to True)_, then load the weights from trained model to the built model +3. Load `TFSpeechFeaturizer` and `TextFeaturizer` to model using function `add_featurizers` +4. Convert model's function to tflite as follows: + +```python +func = model.make_tflite_function(**options) # options are the arguments of the function +concrete_func = func.get_concrete_function() +converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func]) +converter.experimental_new_converter = True +converter.optimizations = [tf.lite.Optimize.DEFAULT] +converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, + tf.lite.OpsSet.SELECT_TF_OPS] +tflite_model = converter.convert() +``` + +5. Save the converted tflite model as follows: + +```python +if not os.path.exists(os.path.dirname(tflite_path)): + os.makedirs(os.path.dirname(tflite_path)) +with open(tflite_path, "wb") as tflite_out: + tflite_out.write(tflite_model) +``` + +5. Then the `.tflite` model is ready to be deployed + +## Features Extraction + +See [features_extraction](./tensorflow_asr/featurizers/README.md) + +## Augmentations + +See [augmentations](./tensorflow_asr/augmentations/README.md) + +## Training & Testing Tutorial + +1. Define config YAML file, see the `config.yml` files in the [example folder](./examples) for reference (you can copy and modify values such as parameters, paths, etc.. to match your local machine configuration) +2. Download your corpus (a.k.a datasets) and create a script to generate `transcripts.tsv` files from your corpus (this is general format used in this project because each dataset has different format). For more detail, see [datasets](./tensorflow_asr/datasets/README.md). **Note:** Make sure your data contain only characters in your language, for example, english has `a` to `z` and `'`. **Do not use `cache` if your dataset size is not fit in the RAM**. +3. [Optional] Generate TFRecords to use `tf.data.TFRecordDataset` for better performance by using the script [create_tfrecords.py](./scripts/create_tfrecords.py) +4. Create vocabulary file (characters or subwords/wordpieces) by defining `language.characters`, using the scripts [generate_vocab_subwords.py](./scripts/generate_vocab_subwords.py) or [generate_vocab_sentencepiece.py](./scripts/generate_vocab_sentencepiece.py). There're predefined ones in [vocabularies](./vocabularies) +5. [Optional] Generate metadata file for your dataset by using script [generate_metadata.py](./scripts/generate_metadata.py). This metadata file contains maximum lengths calculated with your `config.yml` and total number of elements in each dataset, for static shape training and precalculated steps per epoch. +6. For training, see `train.py` files in the [example folder](./examples) to see the options +7. For testing, see `test.py` files in the [example folder](./examples) to see the options. **Note:** Testing is currently not supported for TPUs. It will print nothing other than the progress bar in the console, but it will store the predicted transcripts to the file `output.tsv` and then calculate the metrics from that file. + +**FYI**: Keras builtin training uses **infinite dataset**, which avoids the potential last partial batch. + +See [examples](./examples/) for some predefined ASR models and results + +## Corpus Sources and Pretrained Models + +For pretrained models, go to [drive](https://drive.google.com/drive/folders/1BD0AK30n8hc-yR28C5FW3LqzZxtLOQfl?usp=sharing) + +### English + +| **Name** | **Source** | **Hours** | +| :----------: | :----------------------------------------------------------------: | :-------: | +| LibriSpeech | [LibriSpeech](http://www.openslr.org/12) | 970h | +| Common Voice | [https://commonvoice.mozilla.org](https://commonvoice.mozilla.org) | 1932h | + +### Vietnamese + +| **Name** | **Source** | **Hours** | +| :------------------------------------: | :------------------------------------------------------------------------------------: | :-------: | +| Vivos | [https://ailab.hcmus.edu.vn/vivos](https://ailab.hcmus.edu.vn/vivos) | 15h | +| InfoRe Technology 1 | [InfoRe1 (passwd: BroughtToYouByInfoRe)](https://files.huylenguyen.com/25hours.zip) | 25h | +| InfoRe Technology 2 (used in VLSP2019) | [InfoRe2 (passwd: BroughtToYouByInfoRe)](https://files.huylenguyen.com/audiobooks.zip) | 415h | + +### German + +| **Name** | **Source** | **Hours** | +| :----------: | :-----------------------------------------------------------------: | :-------: | +| Common Voice | [https://commonvoice.mozilla.org/](https://commonvoice.mozilla.org) | 750h | + +## References & Credits + +1. [NVIDIA OpenSeq2Seq Toolkit](https://github.com/NVIDIA/OpenSeq2Seq) +2. [https://github.com/noahchalifour/warp-transducer](https://github.com/noahchalifour/warp-transducer) +3. [Sequence Transduction with Recurrent Neural Network](https://arxiv.org/abs/1211.3711) +4. [End-to-End Speech Processing Toolkit in PyTorch](https://github.com/espnet/espnet) +5. [https://github.com/iankur/ContextNet](https://github.com/iankur/ContextNet) + +## Contact + +Huy Le Nguyen + +Email: nlhuy.cs.16@gmail.com + + + + +%package -n python3-TensorFlowASR +Summary: Almost State-of-the-art Automatic Speech Recognition using Tensorflow 2 +Provides: python-TensorFlowASR +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-TensorFlowASR +<h1 align="center"> +<p>TensorFlowASR :zap:</p> +<p align="center"> +<a href="https://github.com/TensorSpeech/TensorFlowASR/blob/main/LICENSE"> + <img alt="GitHub" src="https://img.shields.io/github/license/TensorSpeech/TensorFlowASR?logo=apache&logoColor=green"> +</a> +<img alt="python" src="https://img.shields.io/badge/python-%3E%3D3.6-blue?logo=python"> +<img alt="tensorflow" src="https://img.shields.io/badge/tensorflow-%3E%3D2.5.1-orange?logo=tensorflow"> +<a href="https://pypi.org/project/TensorFlowASR/"> + <img alt="PyPI" src="https://img.shields.io/pypi/v/TensorFlowASR?color=%234285F4&label=release&logo=pypi&logoColor=%234285F4"> +</a> +</p> +</h1> +<h2 align="center"> +<p>Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2</p> +</h2> + +<p align="center"> +TensorFlowASR implements some automatic speech recognition architectures such as DeepSpeech2, Jasper, RNN Transducer, ContextNet, Conformer, etc. These models can be converted to TFLite to reduce memory and computation for deployment :smile: +</p> + +## What's New? + +- (04/17/2021) Refactor repository with new version 1.x +- (02/16/2021) Supported for TPU training +- (12/27/2020) Supported _naive_ token level timestamp, see [demo](./examples/demonstration/conformer.py) with flag `--timestamp` +- (12/17/2020) Supported ContextNet [http://arxiv.org/abs/2005.03191](http://arxiv.org/abs/2005.03191) +- (12/12/2020) Add support for using masking +- (11/14/2020) Supported Gradient Accumulation for Training in Larger Batch Size + +## Table of Contents + +<!-- TOC --> + +- [What's New?](#whats-new) +- [Table of Contents](#table-of-contents) +- [:yum: Supported Models](#yum-supported-models) + - [Baselines](#baselines) + - [Publications](#publications) +- [Installation](#installation) + - [Installing from source (recommended)](#installing-from-source-recommended) + - [Installing via PyPi](#installing-via-pypi) + - [Running in a container](#running-in-a-container) +- [Setup training and testing](#setup-training-and-testing) +- [TFLite Convertion](#tflite-convertion) +- [Features Extraction](#features-extraction) +- [Augmentations](#augmentations) +- [Training & Testing Tutorial](#training--testing-tutorial) +- [Corpus Sources and Pretrained Models](#corpus-sources-and-pretrained-models) + - [English](#english) + - [Vietnamese](#vietnamese) + - [German](#german) +- [References & Credits](#references--credits) +- [Contact](#contact) + +<!-- /TOC --> + +## :yum: Supported Models + +### Baselines + +- **Transducer Models** (End2end models using RNNT Loss for training, currently supported Conformer, ContextNet, Streaming Transducer) +- **CTCModel** (End2end models using CTC Loss for training, currently supported DeepSpeech2, Jasper) + +### Publications + +- **Conformer Transducer** (Reference: [https://arxiv.org/abs/2005.08100](https://arxiv.org/abs/2005.08100)) + See [examples/conformer](./examples/conformer) +- **Streaming Transducer** (Reference: [https://arxiv.org/abs/1811.06621](https://arxiv.org/abs/1811.06621)) + See [examples/streaming_transducer](./examples/streaming_transducer) +- **ContextNet** (Reference: [http://arxiv.org/abs/2005.03191](http://arxiv.org/abs/2005.03191)) + See [examples/contextnet](./examples/contextnet) +- **Deep Speech 2** (Reference: [https://arxiv.org/abs/1512.02595](https://arxiv.org/abs/1512.02595)) + See [examples/deepspeech2](./examples/deepspeech2) +- **Jasper** (Reference: [https://arxiv.org/abs/1904.03288](https://arxiv.org/abs/1904.03288)) + See [examples/jasper](./examples/jasper) + +## Installation + +For training and testing, you should use `git clone` for installing necessary packages from other authors (`ctc_decoders`, `rnnt_loss`, etc.) + +### Installing from source (recommended) + +```bash +git clone https://github.com/TensorSpeech/TensorFlowASR.git +cd TensorFlowASR +# Tensorflow 2.x (with 2.x.x >= 2.5.1) +pip3 install -e ".[tf2.x]" # or ".[tf2.x-gpu]" +``` + +For anaconda3: + +```bash +conda create -y -n tfasr tensorflow-gpu python=3.8 # tensorflow if using CPU, this makes sure conda install all dependencies for tensorflow +conda activate tfasr +pip install -U tensorflow-gpu # upgrade to latest version of tensorflow +git clone https://github.com/TensorSpeech/TensorFlowASR.git +cd TensorFlowASR +# Tensorflow 2.x (with 2.x.x >= 2.5.1) +pip3 install -e ".[tf2.x]" # or ".[tf2.x-gpu]" +``` + +### Installing via PyPi + +```bash +# Tensorflow 2.x (with 2.x >= 2.3) +pip3 install -U "TensorFlowASR[tf2.x]" # or pip3 install -U "TensorFlowASR[tf2.x-gpu]" +``` + + +### Running in a container + +```bash +docker-compose up -d +``` + +## Setup training and testing + +- For datasets, see [datasets](./tensorflow_asr/datasets/README.md) + +- For _training, testing and using_ **CTC Models**, run `./scripts/install_ctc_decoders.sh` + +- For _training_ **Transducer Models** with RNNT Loss in TF, make sure that [warp-transducer](https://github.com/HawkAaron/warp-transducer) **is not installed** (by simply run `pip3 uninstall warprnnt-tensorflow`) (**Recommended**) + +- For _training_ **Transducer Models** with RNNT Loss from [warp-transducer](https://github.com/HawkAaron/warp-transducer), run `export CUDA_HOME=/usr/local/cuda && ./scripts/install_rnnt_loss.sh` (**Note**: only `export CUDA_HOME` when you have CUDA) + +- For _mixed precision training_, use flag `--mxp` when running python scripts from [examples](./examples) + +- For _enabling XLA_, run `TF_XLA_FLAGS=--tf_xla_auto_jit=2 python3 $path_to_py_script`) + +- For _hiding warnings_, run `export TF_CPP_MIN_LOG_LEVEL=2` before running any examples + +## TFLite Convertion + +After converting to tflite, the tflite model is like a function that transforms directly from an **audio signal** to **unicode code points**, then we can convert unicode points to string. + +1. Install `tf-nightly` using `pip install tf-nightly` +2. Build a model with the same architecture as the trained model _(if model has tflite argument, you must set it to True)_, then load the weights from trained model to the built model +3. Load `TFSpeechFeaturizer` and `TextFeaturizer` to model using function `add_featurizers` +4. Convert model's function to tflite as follows: + +```python +func = model.make_tflite_function(**options) # options are the arguments of the function +concrete_func = func.get_concrete_function() +converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func]) +converter.experimental_new_converter = True +converter.optimizations = [tf.lite.Optimize.DEFAULT] +converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, + tf.lite.OpsSet.SELECT_TF_OPS] +tflite_model = converter.convert() +``` + +5. Save the converted tflite model as follows: + +```python +if not os.path.exists(os.path.dirname(tflite_path)): + os.makedirs(os.path.dirname(tflite_path)) +with open(tflite_path, "wb") as tflite_out: + tflite_out.write(tflite_model) +``` + +5. Then the `.tflite` model is ready to be deployed + +## Features Extraction + +See [features_extraction](./tensorflow_asr/featurizers/README.md) + +## Augmentations + +See [augmentations](./tensorflow_asr/augmentations/README.md) + +## Training & Testing Tutorial + +1. Define config YAML file, see the `config.yml` files in the [example folder](./examples) for reference (you can copy and modify values such as parameters, paths, etc.. to match your local machine configuration) +2. Download your corpus (a.k.a datasets) and create a script to generate `transcripts.tsv` files from your corpus (this is general format used in this project because each dataset has different format). For more detail, see [datasets](./tensorflow_asr/datasets/README.md). **Note:** Make sure your data contain only characters in your language, for example, english has `a` to `z` and `'`. **Do not use `cache` if your dataset size is not fit in the RAM**. +3. [Optional] Generate TFRecords to use `tf.data.TFRecordDataset` for better performance by using the script [create_tfrecords.py](./scripts/create_tfrecords.py) +4. Create vocabulary file (characters or subwords/wordpieces) by defining `language.characters`, using the scripts [generate_vocab_subwords.py](./scripts/generate_vocab_subwords.py) or [generate_vocab_sentencepiece.py](./scripts/generate_vocab_sentencepiece.py). There're predefined ones in [vocabularies](./vocabularies) +5. [Optional] Generate metadata file for your dataset by using script [generate_metadata.py](./scripts/generate_metadata.py). This metadata file contains maximum lengths calculated with your `config.yml` and total number of elements in each dataset, for static shape training and precalculated steps per epoch. +6. For training, see `train.py` files in the [example folder](./examples) to see the options +7. For testing, see `test.py` files in the [example folder](./examples) to see the options. **Note:** Testing is currently not supported for TPUs. It will print nothing other than the progress bar in the console, but it will store the predicted transcripts to the file `output.tsv` and then calculate the metrics from that file. + +**FYI**: Keras builtin training uses **infinite dataset**, which avoids the potential last partial batch. + +See [examples](./examples/) for some predefined ASR models and results + +## Corpus Sources and Pretrained Models + +For pretrained models, go to [drive](https://drive.google.com/drive/folders/1BD0AK30n8hc-yR28C5FW3LqzZxtLOQfl?usp=sharing) + +### English + +| **Name** | **Source** | **Hours** | +| :----------: | :----------------------------------------------------------------: | :-------: | +| LibriSpeech | [LibriSpeech](http://www.openslr.org/12) | 970h | +| Common Voice | [https://commonvoice.mozilla.org](https://commonvoice.mozilla.org) | 1932h | + +### Vietnamese + +| **Name** | **Source** | **Hours** | +| :------------------------------------: | :------------------------------------------------------------------------------------: | :-------: | +| Vivos | [https://ailab.hcmus.edu.vn/vivos](https://ailab.hcmus.edu.vn/vivos) | 15h | +| InfoRe Technology 1 | [InfoRe1 (passwd: BroughtToYouByInfoRe)](https://files.huylenguyen.com/25hours.zip) | 25h | +| InfoRe Technology 2 (used in VLSP2019) | [InfoRe2 (passwd: BroughtToYouByInfoRe)](https://files.huylenguyen.com/audiobooks.zip) | 415h | + +### German + +| **Name** | **Source** | **Hours** | +| :----------: | :-----------------------------------------------------------------: | :-------: | +| Common Voice | [https://commonvoice.mozilla.org/](https://commonvoice.mozilla.org) | 750h | + +## References & Credits + +1. [NVIDIA OpenSeq2Seq Toolkit](https://github.com/NVIDIA/OpenSeq2Seq) +2. [https://github.com/noahchalifour/warp-transducer](https://github.com/noahchalifour/warp-transducer) +3. [Sequence Transduction with Recurrent Neural Network](https://arxiv.org/abs/1211.3711) +4. [End-to-End Speech Processing Toolkit in PyTorch](https://github.com/espnet/espnet) +5. [https://github.com/iankur/ContextNet](https://github.com/iankur/ContextNet) + +## Contact + +Huy Le Nguyen + +Email: nlhuy.cs.16@gmail.com + + + + +%package help +Summary: Development documents and examples for TensorFlowASR +Provides: python3-TensorFlowASR-doc +%description help +<h1 align="center"> +<p>TensorFlowASR :zap:</p> +<p align="center"> +<a href="https://github.com/TensorSpeech/TensorFlowASR/blob/main/LICENSE"> + <img alt="GitHub" src="https://img.shields.io/github/license/TensorSpeech/TensorFlowASR?logo=apache&logoColor=green"> +</a> +<img alt="python" src="https://img.shields.io/badge/python-%3E%3D3.6-blue?logo=python"> +<img alt="tensorflow" src="https://img.shields.io/badge/tensorflow-%3E%3D2.5.1-orange?logo=tensorflow"> +<a href="https://pypi.org/project/TensorFlowASR/"> + <img alt="PyPI" src="https://img.shields.io/pypi/v/TensorFlowASR?color=%234285F4&label=release&logo=pypi&logoColor=%234285F4"> +</a> +</p> +</h1> +<h2 align="center"> +<p>Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2</p> +</h2> + +<p align="center"> +TensorFlowASR implements some automatic speech recognition architectures such as DeepSpeech2, Jasper, RNN Transducer, ContextNet, Conformer, etc. These models can be converted to TFLite to reduce memory and computation for deployment :smile: +</p> + +## What's New? + +- (04/17/2021) Refactor repository with new version 1.x +- (02/16/2021) Supported for TPU training +- (12/27/2020) Supported _naive_ token level timestamp, see [demo](./examples/demonstration/conformer.py) with flag `--timestamp` +- (12/17/2020) Supported ContextNet [http://arxiv.org/abs/2005.03191](http://arxiv.org/abs/2005.03191) +- (12/12/2020) Add support for using masking +- (11/14/2020) Supported Gradient Accumulation for Training in Larger Batch Size + +## Table of Contents + +<!-- TOC --> + +- [What's New?](#whats-new) +- [Table of Contents](#table-of-contents) +- [:yum: Supported Models](#yum-supported-models) + - [Baselines](#baselines) + - [Publications](#publications) +- [Installation](#installation) + - [Installing from source (recommended)](#installing-from-source-recommended) + - [Installing via PyPi](#installing-via-pypi) + - [Running in a container](#running-in-a-container) +- [Setup training and testing](#setup-training-and-testing) +- [TFLite Convertion](#tflite-convertion) +- [Features Extraction](#features-extraction) +- [Augmentations](#augmentations) +- [Training & Testing Tutorial](#training--testing-tutorial) +- [Corpus Sources and Pretrained Models](#corpus-sources-and-pretrained-models) + - [English](#english) + - [Vietnamese](#vietnamese) + - [German](#german) +- [References & Credits](#references--credits) +- [Contact](#contact) + +<!-- /TOC --> + +## :yum: Supported Models + +### Baselines + +- **Transducer Models** (End2end models using RNNT Loss for training, currently supported Conformer, ContextNet, Streaming Transducer) +- **CTCModel** (End2end models using CTC Loss for training, currently supported DeepSpeech2, Jasper) + +### Publications + +- **Conformer Transducer** (Reference: [https://arxiv.org/abs/2005.08100](https://arxiv.org/abs/2005.08100)) + See [examples/conformer](./examples/conformer) +- **Streaming Transducer** (Reference: [https://arxiv.org/abs/1811.06621](https://arxiv.org/abs/1811.06621)) + See [examples/streaming_transducer](./examples/streaming_transducer) +- **ContextNet** (Reference: [http://arxiv.org/abs/2005.03191](http://arxiv.org/abs/2005.03191)) + See [examples/contextnet](./examples/contextnet) +- **Deep Speech 2** (Reference: [https://arxiv.org/abs/1512.02595](https://arxiv.org/abs/1512.02595)) + See [examples/deepspeech2](./examples/deepspeech2) +- **Jasper** (Reference: [https://arxiv.org/abs/1904.03288](https://arxiv.org/abs/1904.03288)) + See [examples/jasper](./examples/jasper) + +## Installation + +For training and testing, you should use `git clone` for installing necessary packages from other authors (`ctc_decoders`, `rnnt_loss`, etc.) + +### Installing from source (recommended) + +```bash +git clone https://github.com/TensorSpeech/TensorFlowASR.git +cd TensorFlowASR +# Tensorflow 2.x (with 2.x.x >= 2.5.1) +pip3 install -e ".[tf2.x]" # or ".[tf2.x-gpu]" +``` + +For anaconda3: + +```bash +conda create -y -n tfasr tensorflow-gpu python=3.8 # tensorflow if using CPU, this makes sure conda install all dependencies for tensorflow +conda activate tfasr +pip install -U tensorflow-gpu # upgrade to latest version of tensorflow +git clone https://github.com/TensorSpeech/TensorFlowASR.git +cd TensorFlowASR +# Tensorflow 2.x (with 2.x.x >= 2.5.1) +pip3 install -e ".[tf2.x]" # or ".[tf2.x-gpu]" +``` + +### Installing via PyPi + +```bash +# Tensorflow 2.x (with 2.x >= 2.3) +pip3 install -U "TensorFlowASR[tf2.x]" # or pip3 install -U "TensorFlowASR[tf2.x-gpu]" +``` + + +### Running in a container + +```bash +docker-compose up -d +``` + +## Setup training and testing + +- For datasets, see [datasets](./tensorflow_asr/datasets/README.md) + +- For _training, testing and using_ **CTC Models**, run `./scripts/install_ctc_decoders.sh` + +- For _training_ **Transducer Models** with RNNT Loss in TF, make sure that [warp-transducer](https://github.com/HawkAaron/warp-transducer) **is not installed** (by simply run `pip3 uninstall warprnnt-tensorflow`) (**Recommended**) + +- For _training_ **Transducer Models** with RNNT Loss from [warp-transducer](https://github.com/HawkAaron/warp-transducer), run `export CUDA_HOME=/usr/local/cuda && ./scripts/install_rnnt_loss.sh` (**Note**: only `export CUDA_HOME` when you have CUDA) + +- For _mixed precision training_, use flag `--mxp` when running python scripts from [examples](./examples) + +- For _enabling XLA_, run `TF_XLA_FLAGS=--tf_xla_auto_jit=2 python3 $path_to_py_script`) + +- For _hiding warnings_, run `export TF_CPP_MIN_LOG_LEVEL=2` before running any examples + +## TFLite Convertion + +After converting to tflite, the tflite model is like a function that transforms directly from an **audio signal** to **unicode code points**, then we can convert unicode points to string. + +1. Install `tf-nightly` using `pip install tf-nightly` +2. Build a model with the same architecture as the trained model _(if model has tflite argument, you must set it to True)_, then load the weights from trained model to the built model +3. Load `TFSpeechFeaturizer` and `TextFeaturizer` to model using function `add_featurizers` +4. Convert model's function to tflite as follows: + +```python +func = model.make_tflite_function(**options) # options are the arguments of the function +concrete_func = func.get_concrete_function() +converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func]) +converter.experimental_new_converter = True +converter.optimizations = [tf.lite.Optimize.DEFAULT] +converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, + tf.lite.OpsSet.SELECT_TF_OPS] +tflite_model = converter.convert() +``` + +5. Save the converted tflite model as follows: + +```python +if not os.path.exists(os.path.dirname(tflite_path)): + os.makedirs(os.path.dirname(tflite_path)) +with open(tflite_path, "wb") as tflite_out: + tflite_out.write(tflite_model) +``` + +5. Then the `.tflite` model is ready to be deployed + +## Features Extraction + +See [features_extraction](./tensorflow_asr/featurizers/README.md) + +## Augmentations + +See [augmentations](./tensorflow_asr/augmentations/README.md) + +## Training & Testing Tutorial + +1. Define config YAML file, see the `config.yml` files in the [example folder](./examples) for reference (you can copy and modify values such as parameters, paths, etc.. to match your local machine configuration) +2. Download your corpus (a.k.a datasets) and create a script to generate `transcripts.tsv` files from your corpus (this is general format used in this project because each dataset has different format). For more detail, see [datasets](./tensorflow_asr/datasets/README.md). **Note:** Make sure your data contain only characters in your language, for example, english has `a` to `z` and `'`. **Do not use `cache` if your dataset size is not fit in the RAM**. +3. [Optional] Generate TFRecords to use `tf.data.TFRecordDataset` for better performance by using the script [create_tfrecords.py](./scripts/create_tfrecords.py) +4. Create vocabulary file (characters or subwords/wordpieces) by defining `language.characters`, using the scripts [generate_vocab_subwords.py](./scripts/generate_vocab_subwords.py) or [generate_vocab_sentencepiece.py](./scripts/generate_vocab_sentencepiece.py). There're predefined ones in [vocabularies](./vocabularies) +5. [Optional] Generate metadata file for your dataset by using script [generate_metadata.py](./scripts/generate_metadata.py). This metadata file contains maximum lengths calculated with your `config.yml` and total number of elements in each dataset, for static shape training and precalculated steps per epoch. +6. For training, see `train.py` files in the [example folder](./examples) to see the options +7. For testing, see `test.py` files in the [example folder](./examples) to see the options. **Note:** Testing is currently not supported for TPUs. It will print nothing other than the progress bar in the console, but it will store the predicted transcripts to the file `output.tsv` and then calculate the metrics from that file. + +**FYI**: Keras builtin training uses **infinite dataset**, which avoids the potential last partial batch. + +See [examples](./examples/) for some predefined ASR models and results + +## Corpus Sources and Pretrained Models + +For pretrained models, go to [drive](https://drive.google.com/drive/folders/1BD0AK30n8hc-yR28C5FW3LqzZxtLOQfl?usp=sharing) + +### English + +| **Name** | **Source** | **Hours** | +| :----------: | :----------------------------------------------------------------: | :-------: | +| LibriSpeech | [LibriSpeech](http://www.openslr.org/12) | 970h | +| Common Voice | [https://commonvoice.mozilla.org](https://commonvoice.mozilla.org) | 1932h | + +### Vietnamese + +| **Name** | **Source** | **Hours** | +| :------------------------------------: | :------------------------------------------------------------------------------------: | :-------: | +| Vivos | [https://ailab.hcmus.edu.vn/vivos](https://ailab.hcmus.edu.vn/vivos) | 15h | +| InfoRe Technology 1 | [InfoRe1 (passwd: BroughtToYouByInfoRe)](https://files.huylenguyen.com/25hours.zip) | 25h | +| InfoRe Technology 2 (used in VLSP2019) | [InfoRe2 (passwd: BroughtToYouByInfoRe)](https://files.huylenguyen.com/audiobooks.zip) | 415h | + +### German + +| **Name** | **Source** | **Hours** | +| :----------: | :-----------------------------------------------------------------: | :-------: | +| Common Voice | [https://commonvoice.mozilla.org/](https://commonvoice.mozilla.org) | 750h | + +## References & Credits + +1. [NVIDIA OpenSeq2Seq Toolkit](https://github.com/NVIDIA/OpenSeq2Seq) +2. [https://github.com/noahchalifour/warp-transducer](https://github.com/noahchalifour/warp-transducer) +3. [Sequence Transduction with Recurrent Neural Network](https://arxiv.org/abs/1211.3711) +4. [End-to-End Speech Processing Toolkit in PyTorch](https://github.com/espnet/espnet) +5. [https://github.com/iankur/ContextNet](https://github.com/iankur/ContextNet) + +## Contact + +Huy Le Nguyen + +Email: nlhuy.cs.16@gmail.com + + + + +%prep +%autosetup -n TensorFlowASR-1.0.3 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-TensorFlowASR -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Fri May 05 2023 Python_Bot <Python_Bot@openeuler.org> - 1.0.3-1 +- Package Spec generated @@ -0,0 +1 @@ +c74525628398e7fde450708fef7ada6d TensorFlowASR-1.0.3.tar.gz |
