%global _empty_manifest_terminate_build 0 Name: python-TensorFlowASR Version: 1.0.3 Release: 1 Summary: Almost State-of-the-art Automatic Speech Recognition using Tensorflow 2 License: Apache Software License URL: https://github.com/TensorSpeech/TensorFlowASR Source0: https://mirrors.nju.edu.cn/pypi/web/packages/89/e3/3bd44a5ebd93eb4c604ecfbbf603b7a5efd5d20ea595f216c4005045f2d5/TensorFlowASR-1.0.3.tar.gz BuildArch: noarch Requires: python3-SoundFile Requires: python3-tensorflow-datasets Requires: python3-nltk Requires: python3-numpy Requires: python3-sentencepiece Requires: python3-tqdm Requires: python3-librosa Requires: python3-PyYAML Requires: python3-Pillow Requires: python3-black Requires: python3-flake8 Requires: python3-sounddevice Requires: python3-fire Requires: python3-tensorflow Requires: python3-tensorflow-text Requires: python3-tensorflow-io Requires: python3-tensorflow-gpu Requires: python3-tensorflow-text Requires: python3-tensorflow-io Requires: python3-tensorflow Requires: python3-tensorflow-text Requires: python3-tensorflow-io Requires: python3-tensorflow-gpu Requires: python3-tensorflow-text Requires: python3-tensorflow-io Requires: python3-tensorflow Requires: python3-tensorflow-text Requires: python3-tensorflow-io Requires: python3-tensorflow-gpu Requires: python3-tensorflow-text Requires: python3-tensorflow-io Requires: python3-tensorflow Requires: python3-tensorflow-text Requires: python3-tensorflow-io Requires: python3-tensorflow-gpu Requires: python3-tensorflow-text Requires: python3-tensorflow-io %description

TensorFlowASR :zap:

GitHub python tensorflow PyPI

Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2

TensorFlowASR implements some automatic speech recognition architectures such as DeepSpeech2, Jasper, RNN Transducer, ContextNet, Conformer, etc. These models can be converted to TFLite to reduce memory and computation for deployment :smile:

## What's New? - (04/17/2021) Refactor repository with new version 1.x - (02/16/2021) Supported for TPU training - (12/27/2020) Supported _naive_ token level timestamp, see [demo](./examples/demonstration/conformer.py) with flag `--timestamp` - (12/17/2020) Supported ContextNet [http://arxiv.org/abs/2005.03191](http://arxiv.org/abs/2005.03191) - (12/12/2020) Add support for using masking - (11/14/2020) Supported Gradient Accumulation for Training in Larger Batch Size ## Table of Contents - [What's New?](#whats-new) - [Table of Contents](#table-of-contents) - [:yum: Supported Models](#yum-supported-models) - [Baselines](#baselines) - [Publications](#publications) - [Installation](#installation) - [Installing from source (recommended)](#installing-from-source-recommended) - [Installing via PyPi](#installing-via-pypi) - [Running in a container](#running-in-a-container) - [Setup training and testing](#setup-training-and-testing) - [TFLite Convertion](#tflite-convertion) - [Features Extraction](#features-extraction) - [Augmentations](#augmentations) - [Training & Testing Tutorial](#training--testing-tutorial) - [Corpus Sources and Pretrained Models](#corpus-sources-and-pretrained-models) - [English](#english) - [Vietnamese](#vietnamese) - [German](#german) - [References & Credits](#references--credits) - [Contact](#contact) ## :yum: Supported Models ### Baselines - **Transducer Models** (End2end models using RNNT Loss for training, currently supported Conformer, ContextNet, Streaming Transducer) - **CTCModel** (End2end models using CTC Loss for training, currently supported DeepSpeech2, Jasper) ### Publications - **Conformer Transducer** (Reference: [https://arxiv.org/abs/2005.08100](https://arxiv.org/abs/2005.08100)) See [examples/conformer](./examples/conformer) - **Streaming Transducer** (Reference: [https://arxiv.org/abs/1811.06621](https://arxiv.org/abs/1811.06621)) See [examples/streaming_transducer](./examples/streaming_transducer) - **ContextNet** (Reference: [http://arxiv.org/abs/2005.03191](http://arxiv.org/abs/2005.03191)) See [examples/contextnet](./examples/contextnet) - **Deep Speech 2** (Reference: [https://arxiv.org/abs/1512.02595](https://arxiv.org/abs/1512.02595)) See [examples/deepspeech2](./examples/deepspeech2) - **Jasper** (Reference: [https://arxiv.org/abs/1904.03288](https://arxiv.org/abs/1904.03288)) See [examples/jasper](./examples/jasper) ## Installation For training and testing, you should use `git clone` for installing necessary packages from other authors (`ctc_decoders`, `rnnt_loss`, etc.) ### Installing from source (recommended) ```bash git clone https://github.com/TensorSpeech/TensorFlowASR.git cd TensorFlowASR # Tensorflow 2.x (with 2.x.x >= 2.5.1) pip3 install -e ".[tf2.x]" # or ".[tf2.x-gpu]" ``` For anaconda3: ```bash conda create -y -n tfasr tensorflow-gpu python=3.8 # tensorflow if using CPU, this makes sure conda install all dependencies for tensorflow conda activate tfasr pip install -U tensorflow-gpu # upgrade to latest version of tensorflow git clone https://github.com/TensorSpeech/TensorFlowASR.git cd TensorFlowASR # Tensorflow 2.x (with 2.x.x >= 2.5.1) pip3 install -e ".[tf2.x]" # or ".[tf2.x-gpu]" ``` ### Installing via PyPi ```bash # Tensorflow 2.x (with 2.x >= 2.3) pip3 install -U "TensorFlowASR[tf2.x]" # or pip3 install -U "TensorFlowASR[tf2.x-gpu]" ``` ### Running in a container ```bash docker-compose up -d ``` ## Setup training and testing - For datasets, see [datasets](./tensorflow_asr/datasets/README.md) - For _training, testing and using_ **CTC Models**, run `./scripts/install_ctc_decoders.sh` - For _training_ **Transducer Models** with RNNT Loss in TF, make sure that [warp-transducer](https://github.com/HawkAaron/warp-transducer) **is not installed** (by simply run `pip3 uninstall warprnnt-tensorflow`) (**Recommended**) - For _training_ **Transducer Models** with RNNT Loss from [warp-transducer](https://github.com/HawkAaron/warp-transducer), run `export CUDA_HOME=/usr/local/cuda && ./scripts/install_rnnt_loss.sh` (**Note**: only `export CUDA_HOME` when you have CUDA) - For _mixed precision training_, use flag `--mxp` when running python scripts from [examples](./examples) - For _enabling XLA_, run `TF_XLA_FLAGS=--tf_xla_auto_jit=2 python3 $path_to_py_script`) - For _hiding warnings_, run `export TF_CPP_MIN_LOG_LEVEL=2` before running any examples ## TFLite Convertion After converting to tflite, the tflite model is like a function that transforms directly from an **audio signal** to **unicode code points**, then we can convert unicode points to string. 1. Install `tf-nightly` using `pip install tf-nightly` 2. Build a model with the same architecture as the trained model _(if model has tflite argument, you must set it to True)_, then load the weights from trained model to the built model 3. Load `TFSpeechFeaturizer` and `TextFeaturizer` to model using function `add_featurizers` 4. Convert model's function to tflite as follows: ```python func = model.make_tflite_function(**options) # options are the arguments of the function concrete_func = func.get_concrete_function() converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func]) converter.experimental_new_converter = True converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS] tflite_model = converter.convert() ``` 5. Save the converted tflite model as follows: ```python if not os.path.exists(os.path.dirname(tflite_path)): os.makedirs(os.path.dirname(tflite_path)) with open(tflite_path, "wb") as tflite_out: tflite_out.write(tflite_model) ``` 5. Then the `.tflite` model is ready to be deployed ## Features Extraction See [features_extraction](./tensorflow_asr/featurizers/README.md) ## Augmentations See [augmentations](./tensorflow_asr/augmentations/README.md) ## Training & Testing Tutorial 1. Define config YAML file, see the `config.yml` files in the [example folder](./examples) for reference (you can copy and modify values such as parameters, paths, etc.. to match your local machine configuration) 2. Download your corpus (a.k.a datasets) and create a script to generate `transcripts.tsv` files from your corpus (this is general format used in this project because each dataset has different format). For more detail, see [datasets](./tensorflow_asr/datasets/README.md). **Note:** Make sure your data contain only characters in your language, for example, english has `a` to `z` and `'`. **Do not use `cache` if your dataset size is not fit in the RAM**. 3. [Optional] Generate TFRecords to use `tf.data.TFRecordDataset` for better performance by using the script [create_tfrecords.py](./scripts/create_tfrecords.py) 4. Create vocabulary file (characters or subwords/wordpieces) by defining `language.characters`, using the scripts [generate_vocab_subwords.py](./scripts/generate_vocab_subwords.py) or [generate_vocab_sentencepiece.py](./scripts/generate_vocab_sentencepiece.py). There're predefined ones in [vocabularies](./vocabularies) 5. [Optional] Generate metadata file for your dataset by using script [generate_metadata.py](./scripts/generate_metadata.py). This metadata file contains maximum lengths calculated with your `config.yml` and total number of elements in each dataset, for static shape training and precalculated steps per epoch. 6. For training, see `train.py` files in the [example folder](./examples) to see the options 7. For testing, see `test.py` files in the [example folder](./examples) to see the options. **Note:** Testing is currently not supported for TPUs. It will print nothing other than the progress bar in the console, but it will store the predicted transcripts to the file `output.tsv` and then calculate the metrics from that file. **FYI**: Keras builtin training uses **infinite dataset**, which avoids the potential last partial batch. See [examples](./examples/) for some predefined ASR models and results ## Corpus Sources and Pretrained Models For pretrained models, go to [drive](https://drive.google.com/drive/folders/1BD0AK30n8hc-yR28C5FW3LqzZxtLOQfl?usp=sharing) ### English | **Name** | **Source** | **Hours** | | :----------: | :----------------------------------------------------------------: | :-------: | | LibriSpeech | [LibriSpeech](http://www.openslr.org/12) | 970h | | Common Voice | [https://commonvoice.mozilla.org](https://commonvoice.mozilla.org) | 1932h | ### Vietnamese | **Name** | **Source** | **Hours** | | :------------------------------------: | :------------------------------------------------------------------------------------: | :-------: | | Vivos | [https://ailab.hcmus.edu.vn/vivos](https://ailab.hcmus.edu.vn/vivos) | 15h | | InfoRe Technology 1 | [InfoRe1 (passwd: BroughtToYouByInfoRe)](https://files.huylenguyen.com/25hours.zip) | 25h | | InfoRe Technology 2 (used in VLSP2019) | [InfoRe2 (passwd: BroughtToYouByInfoRe)](https://files.huylenguyen.com/audiobooks.zip) | 415h | ### German | **Name** | **Source** | **Hours** | | :----------: | :-----------------------------------------------------------------: | :-------: | | Common Voice | [https://commonvoice.mozilla.org/](https://commonvoice.mozilla.org) | 750h | ## References & Credits 1. [NVIDIA OpenSeq2Seq Toolkit](https://github.com/NVIDIA/OpenSeq2Seq) 2. [https://github.com/noahchalifour/warp-transducer](https://github.com/noahchalifour/warp-transducer) 3. [Sequence Transduction with Recurrent Neural Network](https://arxiv.org/abs/1211.3711) 4. [End-to-End Speech Processing Toolkit in PyTorch](https://github.com/espnet/espnet) 5. [https://github.com/iankur/ContextNet](https://github.com/iankur/ContextNet) ## Contact Huy Le Nguyen Email: nlhuy.cs.16@gmail.com %package -n python3-TensorFlowASR Summary: Almost State-of-the-art Automatic Speech Recognition using Tensorflow 2 Provides: python-TensorFlowASR BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-TensorFlowASR

TensorFlowASR :zap:

GitHub python tensorflow PyPI

Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2

TensorFlowASR implements some automatic speech recognition architectures such as DeepSpeech2, Jasper, RNN Transducer, ContextNet, Conformer, etc. These models can be converted to TFLite to reduce memory and computation for deployment :smile:

## What's New? - (04/17/2021) Refactor repository with new version 1.x - (02/16/2021) Supported for TPU training - (12/27/2020) Supported _naive_ token level timestamp, see [demo](./examples/demonstration/conformer.py) with flag `--timestamp` - (12/17/2020) Supported ContextNet [http://arxiv.org/abs/2005.03191](http://arxiv.org/abs/2005.03191) - (12/12/2020) Add support for using masking - (11/14/2020) Supported Gradient Accumulation for Training in Larger Batch Size ## Table of Contents - [What's New?](#whats-new) - [Table of Contents](#table-of-contents) - [:yum: Supported Models](#yum-supported-models) - [Baselines](#baselines) - [Publications](#publications) - [Installation](#installation) - [Installing from source (recommended)](#installing-from-source-recommended) - [Installing via PyPi](#installing-via-pypi) - [Running in a container](#running-in-a-container) - [Setup training and testing](#setup-training-and-testing) - [TFLite Convertion](#tflite-convertion) - [Features Extraction](#features-extraction) - [Augmentations](#augmentations) - [Training & Testing Tutorial](#training--testing-tutorial) - [Corpus Sources and Pretrained Models](#corpus-sources-and-pretrained-models) - [English](#english) - [Vietnamese](#vietnamese) - [German](#german) - [References & Credits](#references--credits) - [Contact](#contact) ## :yum: Supported Models ### Baselines - **Transducer Models** (End2end models using RNNT Loss for training, currently supported Conformer, ContextNet, Streaming Transducer) - **CTCModel** (End2end models using CTC Loss for training, currently supported DeepSpeech2, Jasper) ### Publications - **Conformer Transducer** (Reference: [https://arxiv.org/abs/2005.08100](https://arxiv.org/abs/2005.08100)) See [examples/conformer](./examples/conformer) - **Streaming Transducer** (Reference: [https://arxiv.org/abs/1811.06621](https://arxiv.org/abs/1811.06621)) See [examples/streaming_transducer](./examples/streaming_transducer) - **ContextNet** (Reference: [http://arxiv.org/abs/2005.03191](http://arxiv.org/abs/2005.03191)) See [examples/contextnet](./examples/contextnet) - **Deep Speech 2** (Reference: [https://arxiv.org/abs/1512.02595](https://arxiv.org/abs/1512.02595)) See [examples/deepspeech2](./examples/deepspeech2) - **Jasper** (Reference: [https://arxiv.org/abs/1904.03288](https://arxiv.org/abs/1904.03288)) See [examples/jasper](./examples/jasper) ## Installation For training and testing, you should use `git clone` for installing necessary packages from other authors (`ctc_decoders`, `rnnt_loss`, etc.) ### Installing from source (recommended) ```bash git clone https://github.com/TensorSpeech/TensorFlowASR.git cd TensorFlowASR # Tensorflow 2.x (with 2.x.x >= 2.5.1) pip3 install -e ".[tf2.x]" # or ".[tf2.x-gpu]" ``` For anaconda3: ```bash conda create -y -n tfasr tensorflow-gpu python=3.8 # tensorflow if using CPU, this makes sure conda install all dependencies for tensorflow conda activate tfasr pip install -U tensorflow-gpu # upgrade to latest version of tensorflow git clone https://github.com/TensorSpeech/TensorFlowASR.git cd TensorFlowASR # Tensorflow 2.x (with 2.x.x >= 2.5.1) pip3 install -e ".[tf2.x]" # or ".[tf2.x-gpu]" ``` ### Installing via PyPi ```bash # Tensorflow 2.x (with 2.x >= 2.3) pip3 install -U "TensorFlowASR[tf2.x]" # or pip3 install -U "TensorFlowASR[tf2.x-gpu]" ``` ### Running in a container ```bash docker-compose up -d ``` ## Setup training and testing - For datasets, see [datasets](./tensorflow_asr/datasets/README.md) - For _training, testing and using_ **CTC Models**, run `./scripts/install_ctc_decoders.sh` - For _training_ **Transducer Models** with RNNT Loss in TF, make sure that [warp-transducer](https://github.com/HawkAaron/warp-transducer) **is not installed** (by simply run `pip3 uninstall warprnnt-tensorflow`) (**Recommended**) - For _training_ **Transducer Models** with RNNT Loss from [warp-transducer](https://github.com/HawkAaron/warp-transducer), run `export CUDA_HOME=/usr/local/cuda && ./scripts/install_rnnt_loss.sh` (**Note**: only `export CUDA_HOME` when you have CUDA) - For _mixed precision training_, use flag `--mxp` when running python scripts from [examples](./examples) - For _enabling XLA_, run `TF_XLA_FLAGS=--tf_xla_auto_jit=2 python3 $path_to_py_script`) - For _hiding warnings_, run `export TF_CPP_MIN_LOG_LEVEL=2` before running any examples ## TFLite Convertion After converting to tflite, the tflite model is like a function that transforms directly from an **audio signal** to **unicode code points**, then we can convert unicode points to string. 1. Install `tf-nightly` using `pip install tf-nightly` 2. Build a model with the same architecture as the trained model _(if model has tflite argument, you must set it to True)_, then load the weights from trained model to the built model 3. Load `TFSpeechFeaturizer` and `TextFeaturizer` to model using function `add_featurizers` 4. Convert model's function to tflite as follows: ```python func = model.make_tflite_function(**options) # options are the arguments of the function concrete_func = func.get_concrete_function() converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func]) converter.experimental_new_converter = True converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS] tflite_model = converter.convert() ``` 5. Save the converted tflite model as follows: ```python if not os.path.exists(os.path.dirname(tflite_path)): os.makedirs(os.path.dirname(tflite_path)) with open(tflite_path, "wb") as tflite_out: tflite_out.write(tflite_model) ``` 5. Then the `.tflite` model is ready to be deployed ## Features Extraction See [features_extraction](./tensorflow_asr/featurizers/README.md) ## Augmentations See [augmentations](./tensorflow_asr/augmentations/README.md) ## Training & Testing Tutorial 1. Define config YAML file, see the `config.yml` files in the [example folder](./examples) for reference (you can copy and modify values such as parameters, paths, etc.. to match your local machine configuration) 2. Download your corpus (a.k.a datasets) and create a script to generate `transcripts.tsv` files from your corpus (this is general format used in this project because each dataset has different format). For more detail, see [datasets](./tensorflow_asr/datasets/README.md). **Note:** Make sure your data contain only characters in your language, for example, english has `a` to `z` and `'`. **Do not use `cache` if your dataset size is not fit in the RAM**. 3. [Optional] Generate TFRecords to use `tf.data.TFRecordDataset` for better performance by using the script [create_tfrecords.py](./scripts/create_tfrecords.py) 4. Create vocabulary file (characters or subwords/wordpieces) by defining `language.characters`, using the scripts [generate_vocab_subwords.py](./scripts/generate_vocab_subwords.py) or [generate_vocab_sentencepiece.py](./scripts/generate_vocab_sentencepiece.py). There're predefined ones in [vocabularies](./vocabularies) 5. [Optional] Generate metadata file for your dataset by using script [generate_metadata.py](./scripts/generate_metadata.py). This metadata file contains maximum lengths calculated with your `config.yml` and total number of elements in each dataset, for static shape training and precalculated steps per epoch. 6. For training, see `train.py` files in the [example folder](./examples) to see the options 7. For testing, see `test.py` files in the [example folder](./examples) to see the options. **Note:** Testing is currently not supported for TPUs. It will print nothing other than the progress bar in the console, but it will store the predicted transcripts to the file `output.tsv` and then calculate the metrics from that file. **FYI**: Keras builtin training uses **infinite dataset**, which avoids the potential last partial batch. See [examples](./examples/) for some predefined ASR models and results ## Corpus Sources and Pretrained Models For pretrained models, go to [drive](https://drive.google.com/drive/folders/1BD0AK30n8hc-yR28C5FW3LqzZxtLOQfl?usp=sharing) ### English | **Name** | **Source** | **Hours** | | :----------: | :----------------------------------------------------------------: | :-------: | | LibriSpeech | [LibriSpeech](http://www.openslr.org/12) | 970h | | Common Voice | [https://commonvoice.mozilla.org](https://commonvoice.mozilla.org) | 1932h | ### Vietnamese | **Name** | **Source** | **Hours** | | :------------------------------------: | :------------------------------------------------------------------------------------: | :-------: | | Vivos | [https://ailab.hcmus.edu.vn/vivos](https://ailab.hcmus.edu.vn/vivos) | 15h | | InfoRe Technology 1 | [InfoRe1 (passwd: BroughtToYouByInfoRe)](https://files.huylenguyen.com/25hours.zip) | 25h | | InfoRe Technology 2 (used in VLSP2019) | [InfoRe2 (passwd: BroughtToYouByInfoRe)](https://files.huylenguyen.com/audiobooks.zip) | 415h | ### German | **Name** | **Source** | **Hours** | | :----------: | :-----------------------------------------------------------------: | :-------: | | Common Voice | [https://commonvoice.mozilla.org/](https://commonvoice.mozilla.org) | 750h | ## References & Credits 1. [NVIDIA OpenSeq2Seq Toolkit](https://github.com/NVIDIA/OpenSeq2Seq) 2. [https://github.com/noahchalifour/warp-transducer](https://github.com/noahchalifour/warp-transducer) 3. [Sequence Transduction with Recurrent Neural Network](https://arxiv.org/abs/1211.3711) 4. [End-to-End Speech Processing Toolkit in PyTorch](https://github.com/espnet/espnet) 5. [https://github.com/iankur/ContextNet](https://github.com/iankur/ContextNet) ## Contact Huy Le Nguyen Email: nlhuy.cs.16@gmail.com %package help Summary: Development documents and examples for TensorFlowASR Provides: python3-TensorFlowASR-doc %description help

TensorFlowASR :zap:

GitHub python tensorflow PyPI

Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2

TensorFlowASR implements some automatic speech recognition architectures such as DeepSpeech2, Jasper, RNN Transducer, ContextNet, Conformer, etc. These models can be converted to TFLite to reduce memory and computation for deployment :smile:

## What's New? - (04/17/2021) Refactor repository with new version 1.x - (02/16/2021) Supported for TPU training - (12/27/2020) Supported _naive_ token level timestamp, see [demo](./examples/demonstration/conformer.py) with flag `--timestamp` - (12/17/2020) Supported ContextNet [http://arxiv.org/abs/2005.03191](http://arxiv.org/abs/2005.03191) - (12/12/2020) Add support for using masking - (11/14/2020) Supported Gradient Accumulation for Training in Larger Batch Size ## Table of Contents - [What's New?](#whats-new) - [Table of Contents](#table-of-contents) - [:yum: Supported Models](#yum-supported-models) - [Baselines](#baselines) - [Publications](#publications) - [Installation](#installation) - [Installing from source (recommended)](#installing-from-source-recommended) - [Installing via PyPi](#installing-via-pypi) - [Running in a container](#running-in-a-container) - [Setup training and testing](#setup-training-and-testing) - [TFLite Convertion](#tflite-convertion) - [Features Extraction](#features-extraction) - [Augmentations](#augmentations) - [Training & Testing Tutorial](#training--testing-tutorial) - [Corpus Sources and Pretrained Models](#corpus-sources-and-pretrained-models) - [English](#english) - [Vietnamese](#vietnamese) - [German](#german) - [References & Credits](#references--credits) - [Contact](#contact) ## :yum: Supported Models ### Baselines - **Transducer Models** (End2end models using RNNT Loss for training, currently supported Conformer, ContextNet, Streaming Transducer) - **CTCModel** (End2end models using CTC Loss for training, currently supported DeepSpeech2, Jasper) ### Publications - **Conformer Transducer** (Reference: [https://arxiv.org/abs/2005.08100](https://arxiv.org/abs/2005.08100)) See [examples/conformer](./examples/conformer) - **Streaming Transducer** (Reference: [https://arxiv.org/abs/1811.06621](https://arxiv.org/abs/1811.06621)) See [examples/streaming_transducer](./examples/streaming_transducer) - **ContextNet** (Reference: [http://arxiv.org/abs/2005.03191](http://arxiv.org/abs/2005.03191)) See [examples/contextnet](./examples/contextnet) - **Deep Speech 2** (Reference: [https://arxiv.org/abs/1512.02595](https://arxiv.org/abs/1512.02595)) See [examples/deepspeech2](./examples/deepspeech2) - **Jasper** (Reference: [https://arxiv.org/abs/1904.03288](https://arxiv.org/abs/1904.03288)) See [examples/jasper](./examples/jasper) ## Installation For training and testing, you should use `git clone` for installing necessary packages from other authors (`ctc_decoders`, `rnnt_loss`, etc.) ### Installing from source (recommended) ```bash git clone https://github.com/TensorSpeech/TensorFlowASR.git cd TensorFlowASR # Tensorflow 2.x (with 2.x.x >= 2.5.1) pip3 install -e ".[tf2.x]" # or ".[tf2.x-gpu]" ``` For anaconda3: ```bash conda create -y -n tfasr tensorflow-gpu python=3.8 # tensorflow if using CPU, this makes sure conda install all dependencies for tensorflow conda activate tfasr pip install -U tensorflow-gpu # upgrade to latest version of tensorflow git clone https://github.com/TensorSpeech/TensorFlowASR.git cd TensorFlowASR # Tensorflow 2.x (with 2.x.x >= 2.5.1) pip3 install -e ".[tf2.x]" # or ".[tf2.x-gpu]" ``` ### Installing via PyPi ```bash # Tensorflow 2.x (with 2.x >= 2.3) pip3 install -U "TensorFlowASR[tf2.x]" # or pip3 install -U "TensorFlowASR[tf2.x-gpu]" ``` ### Running in a container ```bash docker-compose up -d ``` ## Setup training and testing - For datasets, see [datasets](./tensorflow_asr/datasets/README.md) - For _training, testing and using_ **CTC Models**, run `./scripts/install_ctc_decoders.sh` - For _training_ **Transducer Models** with RNNT Loss in TF, make sure that [warp-transducer](https://github.com/HawkAaron/warp-transducer) **is not installed** (by simply run `pip3 uninstall warprnnt-tensorflow`) (**Recommended**) - For _training_ **Transducer Models** with RNNT Loss from [warp-transducer](https://github.com/HawkAaron/warp-transducer), run `export CUDA_HOME=/usr/local/cuda && ./scripts/install_rnnt_loss.sh` (**Note**: only `export CUDA_HOME` when you have CUDA) - For _mixed precision training_, use flag `--mxp` when running python scripts from [examples](./examples) - For _enabling XLA_, run `TF_XLA_FLAGS=--tf_xla_auto_jit=2 python3 $path_to_py_script`) - For _hiding warnings_, run `export TF_CPP_MIN_LOG_LEVEL=2` before running any examples ## TFLite Convertion After converting to tflite, the tflite model is like a function that transforms directly from an **audio signal** to **unicode code points**, then we can convert unicode points to string. 1. Install `tf-nightly` using `pip install tf-nightly` 2. Build a model with the same architecture as the trained model _(if model has tflite argument, you must set it to True)_, then load the weights from trained model to the built model 3. Load `TFSpeechFeaturizer` and `TextFeaturizer` to model using function `add_featurizers` 4. Convert model's function to tflite as follows: ```python func = model.make_tflite_function(**options) # options are the arguments of the function concrete_func = func.get_concrete_function() converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func]) converter.experimental_new_converter = True converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS] tflite_model = converter.convert() ``` 5. Save the converted tflite model as follows: ```python if not os.path.exists(os.path.dirname(tflite_path)): os.makedirs(os.path.dirname(tflite_path)) with open(tflite_path, "wb") as tflite_out: tflite_out.write(tflite_model) ``` 5. Then the `.tflite` model is ready to be deployed ## Features Extraction See [features_extraction](./tensorflow_asr/featurizers/README.md) ## Augmentations See [augmentations](./tensorflow_asr/augmentations/README.md) ## Training & Testing Tutorial 1. Define config YAML file, see the `config.yml` files in the [example folder](./examples) for reference (you can copy and modify values such as parameters, paths, etc.. to match your local machine configuration) 2. Download your corpus (a.k.a datasets) and create a script to generate `transcripts.tsv` files from your corpus (this is general format used in this project because each dataset has different format). For more detail, see [datasets](./tensorflow_asr/datasets/README.md). **Note:** Make sure your data contain only characters in your language, for example, english has `a` to `z` and `'`. **Do not use `cache` if your dataset size is not fit in the RAM**. 3. [Optional] Generate TFRecords to use `tf.data.TFRecordDataset` for better performance by using the script [create_tfrecords.py](./scripts/create_tfrecords.py) 4. Create vocabulary file (characters or subwords/wordpieces) by defining `language.characters`, using the scripts [generate_vocab_subwords.py](./scripts/generate_vocab_subwords.py) or [generate_vocab_sentencepiece.py](./scripts/generate_vocab_sentencepiece.py). There're predefined ones in [vocabularies](./vocabularies) 5. [Optional] Generate metadata file for your dataset by using script [generate_metadata.py](./scripts/generate_metadata.py). This metadata file contains maximum lengths calculated with your `config.yml` and total number of elements in each dataset, for static shape training and precalculated steps per epoch. 6. For training, see `train.py` files in the [example folder](./examples) to see the options 7. For testing, see `test.py` files in the [example folder](./examples) to see the options. **Note:** Testing is currently not supported for TPUs. It will print nothing other than the progress bar in the console, but it will store the predicted transcripts to the file `output.tsv` and then calculate the metrics from that file. **FYI**: Keras builtin training uses **infinite dataset**, which avoids the potential last partial batch. See [examples](./examples/) for some predefined ASR models and results ## Corpus Sources and Pretrained Models For pretrained models, go to [drive](https://drive.google.com/drive/folders/1BD0AK30n8hc-yR28C5FW3LqzZxtLOQfl?usp=sharing) ### English | **Name** | **Source** | **Hours** | | :----------: | :----------------------------------------------------------------: | :-------: | | LibriSpeech | [LibriSpeech](http://www.openslr.org/12) | 970h | | Common Voice | [https://commonvoice.mozilla.org](https://commonvoice.mozilla.org) | 1932h | ### Vietnamese | **Name** | **Source** | **Hours** | | :------------------------------------: | :------------------------------------------------------------------------------------: | :-------: | | Vivos | [https://ailab.hcmus.edu.vn/vivos](https://ailab.hcmus.edu.vn/vivos) | 15h | | InfoRe Technology 1 | [InfoRe1 (passwd: BroughtToYouByInfoRe)](https://files.huylenguyen.com/25hours.zip) | 25h | | InfoRe Technology 2 (used in VLSP2019) | [InfoRe2 (passwd: BroughtToYouByInfoRe)](https://files.huylenguyen.com/audiobooks.zip) | 415h | ### German | **Name** | **Source** | **Hours** | | :----------: | :-----------------------------------------------------------------: | :-------: | | Common Voice | [https://commonvoice.mozilla.org/](https://commonvoice.mozilla.org) | 750h | ## References & Credits 1. [NVIDIA OpenSeq2Seq Toolkit](https://github.com/NVIDIA/OpenSeq2Seq) 2. [https://github.com/noahchalifour/warp-transducer](https://github.com/noahchalifour/warp-transducer) 3. [Sequence Transduction with Recurrent Neural Network](https://arxiv.org/abs/1211.3711) 4. [End-to-End Speech Processing Toolkit in PyTorch](https://github.com/espnet/espnet) 5. [https://github.com/iankur/ContextNet](https://github.com/iankur/ContextNet) ## Contact Huy Le Nguyen Email: nlhuy.cs.16@gmail.com %prep %autosetup -n TensorFlowASR-1.0.3 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-TensorFlowASR -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Fri May 05 2023 Python_Bot - 1.0.3-1 - Package Spec generated