%global _empty_manifest_terminate_build 0 Name: python-asrp Version: 0.0.74 Release: 1 Summary: please add a summary manually as the author left a blank one License: Apache URL: https://github.com/voidful/asrp Source0: https://mirrors.nju.edu.cn/pypi/web/packages/62/59/7929ee67d3e466c73931f22fef8640aa21332228b00ab8f237b2f0e004a2/asrp-0.0.74.tar.gz BuildArch: noarch Requires: python3-Unidecode Requires: python3-jiwer Requires: python3-transformers Requires: python3-editdistance Requires: python3-librosa Requires: python3-webrtcvad Requires: python3-pyctcdecode Requires: python3-openai-whisper Requires: python3-nlp2 %description # ASRP: Automatic Speech Recognition Preprocessing Utility ASRP is a python package that offers a set of tools to preprocess and evaluate ASR (Automatic Speech Recognition) text. The package also provides a speech-to-text transcription tool and a text-to-speech conversion tool. The code is open-source and can be installed using pip. Key Features - [Preprocess ASR text with ease](#preprocess) - [Evaluate ASR output quality](#Evaluation) - [Transcribe speech to Hubert code](#speech-to-discrete-unit) - [Convert unit code to speech](#discrete-unit-to-speech) - [Enhance speech quality with a noise reduction tool](#speech-enhancement) - [LiveASR tool for real-time speech recognition](#liveasr---huggingfaces-model) - [Speaker Embedding Extraction (x-vector/d-vector)](#speaker-embedding-extraction---x-vector) ## install `pip install asrp` ## Preprocess ASRP offers an easy-to-use set of functions to preprocess ASR text data. The input data is a dictionary with the key 'sentence', and the output is the preprocessed text. You can either use the fun_en function or use dynamic loading. Here's how to use it: ```python import asrp batch_data = { 'sentence': "I'm fine, thanks." } asrp.fun_en(batch_data) ``` dynamic loading ```python import asrp batch_data = { 'sentence': "I'm fine, thanks." } preprocessor = getattr(asrp, 'fun_en') preprocessor(batch_data) ``` ## Evaluation ASRP provides functions to evaluate the output quality of ASR systems using the Word Error Rate (WER) and Character Error Rate (CER) metrics. Here's how to use it: ```python import asrp targets = ['HuggingFace is great!', 'Love Transformers!', 'Let\'s wav2vec!'] preds = ['HuggingFace is awesome!', 'Transformers is powerful.', 'Let\'s finetune wav2vec!'] print("chunk size WER: {:2f}".format(100 * asrp.chunked_wer(targets, preds, chunk_size=None))) print("chunk size CER: {:2f}".format(100 * asrp.chunked_cer(targets, preds, chunk_size=None))) ``` ## Speech to Discrete Unit ```python import asrp import nlp2 # https://github.com/facebookresearch/fairseq/blob/ust/examples/speech_to_speech/docs/textless_s2st_real_data.md # https://github.com/facebookresearch/fairseq/tree/main/examples/textless_nlp/gslm/ulm nlp2.download_file( 'https://huggingface.co/voidful/mhubert-base/resolve/main/mhubert_base_vp_en_es_fr_it3_L11_km1000.bin', './') hc = asrp.HubertCode("voidful/mhubert-base", './mhubert_base_vp_en_es_fr_it3_L11_km1000.bin', 11, chunk_sec=30, worker=20) hc('voice file path') ``` ## Discrete Unit to speech ```python import asrp code = [] # discrete unit # https://github.com/pytorch/fairseq/tree/main/examples/textless_nlp/gslm/unit2speech # https://github.com/facebookresearch/fairseq/blob/ust/examples/speech_to_speech/docs/textless_s2st_real_data.md cs = asrp.Code2Speech(tts_checkpoint='./tts_checkpoint_best.pt', waveglow_checkpint='waveglow_256channels_new.pt') cs(code) # play on notebook import IPython.display as ipd ipd.Audio(data=cs(code), autoplay=False, rate=cs.sample_rate) ``` mhubert English hifigan vocoder example ```python import asrp import nlp2 import IPython.display as ipd from transformers import AutoTokenizer, AutoModelForSeq2SeqLM nlp2.download_file( 'https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/g_00500000', './') tokenizer = AutoTokenizer.from_pretrained("voidful/mhubert-unit-tts") model = AutoModelForSeq2SeqLM.from_pretrained("voidful/mhubert-unit-tts") model.eval() cs = asrp.Code2Speech(tts_checkpoint='./g_00500000', vocoder='hifigan') inputs = tokenizer(["The quick brown fox jumps over the lazy dog."], return_tensors="pt") code = tokenizer.batch_decode(model.generate(**inputs,max_length=1024))[0] code = [int(i) for i in code.replace("","").replace("","").split("v_tok_")[1:]] print(code) ipd.Audio(data=cs(code), autoplay=False, rate=cs.sample_rate) ``` ## Speech Enhancement ASRP also provides a tool to enhance speech quality with a noise reduction tool. from https://github.com/facebookresearch/fairseq/tree/main/examples/speech_synthesis/preprocessing/denoiser ```python from asrp import SpeechEnhancer ase = SpeechEnhancer() print(ase('./test/xxx.wav')) ``` ## LiveASR - huggingface's model * modify from https://github.com/oliverguhr/wav2vec2-live ```python from asrp.live import LiveSpeech english_model = "voidful/wav2vec2-xlsr-multilingual-56" asr = LiveSpeech(english_model, device_name="default") asr.start() try: while True: text, sample_length, inference_time = asr.get_last_text() print(f"{sample_length:.3f}s" + f"\t{inference_time:.3f}s" + f"\t{text}") except KeyboardInterrupt: asr.stop() ``` ## LiveASR - whisper's model ```python from asrp.live import LiveSpeech whisper_model = "tiny" asr = LiveSpeech(whisper_model, vad_mode=2, language='zh') asr.start() last_text = "" while True: asr_text = "" try: asr_text, sample_length, inference_time = asr.get_last_text() if len(asr_text) > 0: print(asr_text, sample_length, inference_time) except KeyboardInterrupt: asr.stop() break ``` ## Speaker Embedding Extraction - x vector from https://speechbrain.readthedocs.io/en/latest/API/speechbrain.lobes.models.Xvector.html ```python from asrp.speaker_embedding import extract_x_vector extract_x_vector('./test/xxx.wav') ``` ## Speaker Embedding Extraction - d vector from https://github.com/yistLin/dvector ```python from asrp.speaker_embedding import extract_d_vector extract_d_vector('./test/xxx.wav') ``` %package -n python3-asrp Summary: please add a summary manually as the author left a blank one Provides: python-asrp BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: python3-pip %description -n python3-asrp # ASRP: Automatic Speech Recognition Preprocessing Utility ASRP is a python package that offers a set of tools to preprocess and evaluate ASR (Automatic Speech Recognition) text. The package also provides a speech-to-text transcription tool and a text-to-speech conversion tool. The code is open-source and can be installed using pip. Key Features - [Preprocess ASR text with ease](#preprocess) - [Evaluate ASR output quality](#Evaluation) - [Transcribe speech to Hubert code](#speech-to-discrete-unit) - [Convert unit code to speech](#discrete-unit-to-speech) - [Enhance speech quality with a noise reduction tool](#speech-enhancement) - [LiveASR tool for real-time speech recognition](#liveasr---huggingfaces-model) - [Speaker Embedding Extraction (x-vector/d-vector)](#speaker-embedding-extraction---x-vector) ## install `pip install asrp` ## Preprocess ASRP offers an easy-to-use set of functions to preprocess ASR text data. The input data is a dictionary with the key 'sentence', and the output is the preprocessed text. You can either use the fun_en function or use dynamic loading. Here's how to use it: ```python import asrp batch_data = { 'sentence': "I'm fine, thanks." } asrp.fun_en(batch_data) ``` dynamic loading ```python import asrp batch_data = { 'sentence': "I'm fine, thanks." } preprocessor = getattr(asrp, 'fun_en') preprocessor(batch_data) ``` ## Evaluation ASRP provides functions to evaluate the output quality of ASR systems using the Word Error Rate (WER) and Character Error Rate (CER) metrics. Here's how to use it: ```python import asrp targets = ['HuggingFace is great!', 'Love Transformers!', 'Let\'s wav2vec!'] preds = ['HuggingFace is awesome!', 'Transformers is powerful.', 'Let\'s finetune wav2vec!'] print("chunk size WER: {:2f}".format(100 * asrp.chunked_wer(targets, preds, chunk_size=None))) print("chunk size CER: {:2f}".format(100 * asrp.chunked_cer(targets, preds, chunk_size=None))) ``` ## Speech to Discrete Unit ```python import asrp import nlp2 # https://github.com/facebookresearch/fairseq/blob/ust/examples/speech_to_speech/docs/textless_s2st_real_data.md # https://github.com/facebookresearch/fairseq/tree/main/examples/textless_nlp/gslm/ulm nlp2.download_file( 'https://huggingface.co/voidful/mhubert-base/resolve/main/mhubert_base_vp_en_es_fr_it3_L11_km1000.bin', './') hc = asrp.HubertCode("voidful/mhubert-base", './mhubert_base_vp_en_es_fr_it3_L11_km1000.bin', 11, chunk_sec=30, worker=20) hc('voice file path') ``` ## Discrete Unit to speech ```python import asrp code = [] # discrete unit # https://github.com/pytorch/fairseq/tree/main/examples/textless_nlp/gslm/unit2speech # https://github.com/facebookresearch/fairseq/blob/ust/examples/speech_to_speech/docs/textless_s2st_real_data.md cs = asrp.Code2Speech(tts_checkpoint='./tts_checkpoint_best.pt', waveglow_checkpint='waveglow_256channels_new.pt') cs(code) # play on notebook import IPython.display as ipd ipd.Audio(data=cs(code), autoplay=False, rate=cs.sample_rate) ``` mhubert English hifigan vocoder example ```python import asrp import nlp2 import IPython.display as ipd from transformers import AutoTokenizer, AutoModelForSeq2SeqLM nlp2.download_file( 'https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/g_00500000', './') tokenizer = AutoTokenizer.from_pretrained("voidful/mhubert-unit-tts") model = AutoModelForSeq2SeqLM.from_pretrained("voidful/mhubert-unit-tts") model.eval() cs = asrp.Code2Speech(tts_checkpoint='./g_00500000', vocoder='hifigan') inputs = tokenizer(["The quick brown fox jumps over the lazy dog."], return_tensors="pt") code = tokenizer.batch_decode(model.generate(**inputs,max_length=1024))[0] code = [int(i) for i in code.replace("","").replace("","").split("v_tok_")[1:]] print(code) ipd.Audio(data=cs(code), autoplay=False, rate=cs.sample_rate) ``` ## Speech Enhancement ASRP also provides a tool to enhance speech quality with a noise reduction tool. from https://github.com/facebookresearch/fairseq/tree/main/examples/speech_synthesis/preprocessing/denoiser ```python from asrp import SpeechEnhancer ase = SpeechEnhancer() print(ase('./test/xxx.wav')) ``` ## LiveASR - huggingface's model * modify from https://github.com/oliverguhr/wav2vec2-live ```python from asrp.live import LiveSpeech english_model = "voidful/wav2vec2-xlsr-multilingual-56" asr = LiveSpeech(english_model, device_name="default") asr.start() try: while True: text, sample_length, inference_time = asr.get_last_text() print(f"{sample_length:.3f}s" + f"\t{inference_time:.3f}s" + f"\t{text}") except KeyboardInterrupt: asr.stop() ``` ## LiveASR - whisper's model ```python from asrp.live import LiveSpeech whisper_model = "tiny" asr = LiveSpeech(whisper_model, vad_mode=2, language='zh') asr.start() last_text = "" while True: asr_text = "" try: asr_text, sample_length, inference_time = asr.get_last_text() if len(asr_text) > 0: print(asr_text, sample_length, inference_time) except KeyboardInterrupt: asr.stop() break ``` ## Speaker Embedding Extraction - x vector from https://speechbrain.readthedocs.io/en/latest/API/speechbrain.lobes.models.Xvector.html ```python from asrp.speaker_embedding import extract_x_vector extract_x_vector('./test/xxx.wav') ``` ## Speaker Embedding Extraction - d vector from https://github.com/yistLin/dvector ```python from asrp.speaker_embedding import extract_d_vector extract_d_vector('./test/xxx.wav') ``` %package help Summary: Development documents and examples for asrp Provides: python3-asrp-doc %description help # ASRP: Automatic Speech Recognition Preprocessing Utility ASRP is a python package that offers a set of tools to preprocess and evaluate ASR (Automatic Speech Recognition) text. The package also provides a speech-to-text transcription tool and a text-to-speech conversion tool. The code is open-source and can be installed using pip. Key Features - [Preprocess ASR text with ease](#preprocess) - [Evaluate ASR output quality](#Evaluation) - [Transcribe speech to Hubert code](#speech-to-discrete-unit) - [Convert unit code to speech](#discrete-unit-to-speech) - [Enhance speech quality with a noise reduction tool](#speech-enhancement) - [LiveASR tool for real-time speech recognition](#liveasr---huggingfaces-model) - [Speaker Embedding Extraction (x-vector/d-vector)](#speaker-embedding-extraction---x-vector) ## install `pip install asrp` ## Preprocess ASRP offers an easy-to-use set of functions to preprocess ASR text data. The input data is a dictionary with the key 'sentence', and the output is the preprocessed text. You can either use the fun_en function or use dynamic loading. Here's how to use it: ```python import asrp batch_data = { 'sentence': "I'm fine, thanks." } asrp.fun_en(batch_data) ``` dynamic loading ```python import asrp batch_data = { 'sentence': "I'm fine, thanks." } preprocessor = getattr(asrp, 'fun_en') preprocessor(batch_data) ``` ## Evaluation ASRP provides functions to evaluate the output quality of ASR systems using the Word Error Rate (WER) and Character Error Rate (CER) metrics. Here's how to use it: ```python import asrp targets = ['HuggingFace is great!', 'Love Transformers!', 'Let\'s wav2vec!'] preds = ['HuggingFace is awesome!', 'Transformers is powerful.', 'Let\'s finetune wav2vec!'] print("chunk size WER: {:2f}".format(100 * asrp.chunked_wer(targets, preds, chunk_size=None))) print("chunk size CER: {:2f}".format(100 * asrp.chunked_cer(targets, preds, chunk_size=None))) ``` ## Speech to Discrete Unit ```python import asrp import nlp2 # https://github.com/facebookresearch/fairseq/blob/ust/examples/speech_to_speech/docs/textless_s2st_real_data.md # https://github.com/facebookresearch/fairseq/tree/main/examples/textless_nlp/gslm/ulm nlp2.download_file( 'https://huggingface.co/voidful/mhubert-base/resolve/main/mhubert_base_vp_en_es_fr_it3_L11_km1000.bin', './') hc = asrp.HubertCode("voidful/mhubert-base", './mhubert_base_vp_en_es_fr_it3_L11_km1000.bin', 11, chunk_sec=30, worker=20) hc('voice file path') ``` ## Discrete Unit to speech ```python import asrp code = [] # discrete unit # https://github.com/pytorch/fairseq/tree/main/examples/textless_nlp/gslm/unit2speech # https://github.com/facebookresearch/fairseq/blob/ust/examples/speech_to_speech/docs/textless_s2st_real_data.md cs = asrp.Code2Speech(tts_checkpoint='./tts_checkpoint_best.pt', waveglow_checkpint='waveglow_256channels_new.pt') cs(code) # play on notebook import IPython.display as ipd ipd.Audio(data=cs(code), autoplay=False, rate=cs.sample_rate) ``` mhubert English hifigan vocoder example ```python import asrp import nlp2 import IPython.display as ipd from transformers import AutoTokenizer, AutoModelForSeq2SeqLM nlp2.download_file( 'https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/g_00500000', './') tokenizer = AutoTokenizer.from_pretrained("voidful/mhubert-unit-tts") model = AutoModelForSeq2SeqLM.from_pretrained("voidful/mhubert-unit-tts") model.eval() cs = asrp.Code2Speech(tts_checkpoint='./g_00500000', vocoder='hifigan') inputs = tokenizer(["The quick brown fox jumps over the lazy dog."], return_tensors="pt") code = tokenizer.batch_decode(model.generate(**inputs,max_length=1024))[0] code = [int(i) for i in code.replace("","").replace("","").split("v_tok_")[1:]] print(code) ipd.Audio(data=cs(code), autoplay=False, rate=cs.sample_rate) ``` ## Speech Enhancement ASRP also provides a tool to enhance speech quality with a noise reduction tool. from https://github.com/facebookresearch/fairseq/tree/main/examples/speech_synthesis/preprocessing/denoiser ```python from asrp import SpeechEnhancer ase = SpeechEnhancer() print(ase('./test/xxx.wav')) ``` ## LiveASR - huggingface's model * modify from https://github.com/oliverguhr/wav2vec2-live ```python from asrp.live import LiveSpeech english_model = "voidful/wav2vec2-xlsr-multilingual-56" asr = LiveSpeech(english_model, device_name="default") asr.start() try: while True: text, sample_length, inference_time = asr.get_last_text() print(f"{sample_length:.3f}s" + f"\t{inference_time:.3f}s" + f"\t{text}") except KeyboardInterrupt: asr.stop() ``` ## LiveASR - whisper's model ```python from asrp.live import LiveSpeech whisper_model = "tiny" asr = LiveSpeech(whisper_model, vad_mode=2, language='zh') asr.start() last_text = "" while True: asr_text = "" try: asr_text, sample_length, inference_time = asr.get_last_text() if len(asr_text) > 0: print(asr_text, sample_length, inference_time) except KeyboardInterrupt: asr.stop() break ``` ## Speaker Embedding Extraction - x vector from https://speechbrain.readthedocs.io/en/latest/API/speechbrain.lobes.models.Xvector.html ```python from asrp.speaker_embedding import extract_x_vector extract_x_vector('./test/xxx.wav') ``` ## Speaker Embedding Extraction - d vector from https://github.com/yistLin/dvector ```python from asrp.speaker_embedding import extract_d_vector extract_d_vector('./test/xxx.wav') ``` %prep %autosetup -n asrp-0.0.74 %build %py3_build %install %py3_install install -d -m755 %{buildroot}/%{_pkgdocdir} if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi pushd %{buildroot} if [ -d usr/lib ]; then find usr/lib -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/lib64 ]; then find usr/lib64 -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/bin ]; then find usr/bin -type f -printf "/%h/%f\n" >> filelist.lst fi if [ -d usr/sbin ]; then find usr/sbin -type f -printf "/%h/%f\n" >> filelist.lst fi touch doclist.lst if [ -d usr/share/man ]; then find usr/share/man -type f -printf "/%h/%f.gz\n" >> doclist.lst fi popd mv %{buildroot}/filelist.lst . mv %{buildroot}/doclist.lst . %files -n python3-asrp -f filelist.lst %dir %{python3_sitelib}/* %files help -f doclist.lst %{_docdir}/* %changelog * Tue May 30 2023 Python_Bot - 0.0.74-1 - Package Spec generated