diff options
author | CoprDistGit <infra@openeuler.org> | 2023-06-09 02:03:51 +0000 |
---|---|---|
committer | CoprDistGit <infra@openeuler.org> | 2023-06-09 02:03:51 +0000 |
commit | cc4a17b686fcca1ba8ddbf9d23acf2076604b862 (patch) | |
tree | 3904a80dcf6517bf7aee384feeac18aff1562961 /python-miditok.spec | |
parent | 1e485d41289422b4e218698b2a164d88a9a3a9da (diff) |
automatic import of python-miditokopeneuler20.03
Diffstat (limited to 'python-miditok.spec')
-rw-r--r-- | python-miditok.spec | 432 |
1 files changed, 432 insertions, 0 deletions
diff --git a/python-miditok.spec b/python-miditok.spec new file mode 100644 index 0000000..ba06ea5 --- /dev/null +++ b/python-miditok.spec @@ -0,0 +1,432 @@ +%global _empty_manifest_terminate_build 0 +Name: python-miditok +Version: 2.0.6 +Release: 1 +Summary: A convenient MIDI tokenizer for Deep Learning networks, with multiple encoding strategies +License: MIT +URL: https://github.com/Natooz/MidiTok +Source0: https://mirrors.aliyun.com/pypi/web/packages/3c/58/587f75bd26a9717872bc5d1276ddcc10cedbf9344037fa56e062b1f0cbac/miditok-2.0.6.tar.gz +BuildArch: noarch + +Requires: python3-numpy +Requires: python3-miditoolkit +Requires: python3-tqdm + +%description +# MidiTok + +Python package to tokenize MIDI music files, presented at the ISMIR 2021 LBD. + + + +[](https://pypi.python.org/pypi/miditok/) +[](https://www.python.org/downloads/release/) +[](https://miditok.readthedocs.io/en/latest/?badge=latest) +[](https://github.com/Natooz/MidiTok/actions/workflows/pytest.yml) +[](https://codecov.io/gh/Natooz/MidiTok) +[](https://github.com/Natooz/MidiTok/blob/main/LICENSE) +[](https://pepy.tech/project/MidiTok) +[](https://github.com/psf/black) + +Using Deep Learning with symbolic music ? MidiTok can take care of converting (tokenizing) your MIDI files into tokens, ready to be fed to models such as Transformer, for any generation, transcription or MIR task. +MidiTok features most known [MIDI tokenizations](https://miditok.readthedocs.io/en/latest/tokenizations.html) (e.g. [REMI](https://arxiv.org/abs/2002.00212), [Compound Word](https://arxiv.org/abs/2101.02402)...), and is built around the idea that they all share common parameters and methods. It supports [Byte Pair Encoding (BPE)](https://arxiv.org/abs/2301.11975) and data augmentation. + +**Documentation:** [miditok.readthedocs.com](https://miditok.readthedocs.io/en/latest/index.html) + +## Install + +```shell +pip install miditok +``` +MidiTok uses [MIDIToolkit](https://github.com/YatingMusic/miditoolkit), which itself uses [Mido](https://github.com/mido/mido) to read and write MIDI files, and BPE is backed by [Hugging Face 🤗tokenizers](https://github.com/huggingface/tokenizers) for super-fast encoding. + +## Usage example + +The most basic and useful methods are summarized here. And [here](colab-notebooks/Full_Example_HuggingFace_GPT2_Transformer.ipynb) is a simple notebook example showing how to use Hugging Face models to generate music, with MidiTok taking care of tokenizing MIDIs. + +```python +from miditok import REMI +from miditok.utils import get_midi_programs +from miditoolkit import MidiFile +from pathlib import Path + +# Creates the tokenizer and loads a MIDI +tokenizer = REMI() # using the default parameters, read the documentation to customize your tokenizer +midi = MidiFile('path/to/your_midi.mid') + +# Converts MIDI to tokens, and back to a MIDI +tokens = tokenizer(midi) # calling it will automatically detect MIDIs, paths and tokens before the conversion +converted_back_midi = tokenizer(tokens, get_midi_programs(midi)) # PyTorch / Tensorflow / Numpy tensors supported + +# Converts MIDI files to tokens saved as JSON files +midi_paths = list(Path("path", "to", "dataset").glob("**/*.mid")) +data_augmentation_offsets = [2, 1, 1] # data augmentation on 2 pitch octaves, 1 velocity and 1 duration values +tokenizer.tokenize_midi_dataset(midi_paths, Path("path", "to", "tokens_noBPE"), + data_augment_offsets=data_augmentation_offsets) + +# Constructs the vocabulary with BPE, from the tokenized files +tokenizer.learn_bpe( + vocab_size=500, + tokens_paths=list(Path("path", "to", "tokens_noBPE").glob("**/*.json")), + start_from_empty_voc=False, +) + +# Saving our tokenizer, to retrieve it back later with the load_params method +tokenizer.save_params(Path("path", "to", "save", "tokenizer")) + +# Converts the tokenized musics into tokens with BPE +tokenizer.apply_bpe_to_dataset(Path('path', 'to', 'tokens_noBPE'), Path('path', 'to', 'tokens_BPE')) +``` + +## Tokenizations + +MidiTok implements the tokenizations: (links to original papers) +* [REMI](https://dl.acm.org/doi/10.1145/3394171.3413671) +* [REMI+](https://openreview.net/forum?id=NyR8OZFHw6i) +* [MIDI-Like](https://link.springer.com/article/10.1007/s00521-018-3758-9) +* [TSD](https://arxiv.org/abs/2301.11975) +* [Structured](https://arxiv.org/abs/2107.05944) +* [CPWord](https://ojs.aaai.org/index.php/AAAI/article/view/16091) +* [Octuple](https://aclanthology.org/2021.findings-acl.70) +* [MuMIDI](https://dl.acm.org/doi/10.1145/3394171.3413721) +* [MMM](https://arxiv.org/abs/2008.06048) + +You can find short presentations in the [documentation](https://miditok.readthedocs.io/en/latest/tokenizations.html). + +## Limitations + +Tokenizations using Bar tokens (REMI, Compound Word and MuMIDI) **only considers a 4/x time signature** for now. This means that each bar is considered covering 4 beats. +REMI+ and Octuple support it. + +## Contributions + +Contributions are gratefully welcomed, feel free to open an issue or send a PR if you want to add a tokenization or speed up the code. You can read the [contribution guide](CONTRIBUTING.md) for details. + +### Todos + +* Extend Time Signature to all tokenizations +* Control Change messages +* Option to represent pitch values as pitch intervals, as [it seems to improve performances](https://ismir2022program.ismir.net/lbd_369.html). +* Speeding up MIDI read / load (Rust / C++ binding) +* Data augmentation on duration values at the MIDI level + +## Citation + +If you use MidiTok for your research, a citation in your manuscript would be gladly appreciated. ❤️ + +[**MidiTok paper**](https://archives.ismir.net/ismir2021/latebreaking/000005.pdf) +```bibtex +@inproceedings{miditok2021, + title={{MidiTok}: A Python package for {MIDI} file tokenization}, + author={Fradet, Nathan and Briot, Jean-Pierre and Chhel, Fabien and El Fallah Seghrouchni, Amal and Gutowski, Nicolas}, + booktitle={Extended Abstracts for the Late-Breaking Demo Session of the 22nd International Society for Music Information Retrieval Conference}, + year={2021}, + url={https://archives.ismir.net/ismir2021/latebreaking/000005.pdf}, +} +``` + +The BibTeX citations of all tokenizations can be found [in the documentation](https://miditok.readthedocs.io/en/latest/citations.html) + + +## Acknowledgments + +Special thanks to all the contributors. +We acknowledge [Aubay](https://blog.aubay.com/index.php/language/en/home/?lang=en), the [LIP6](https://www.lip6.fr/?LANG=en), [LERIA](http://blog.univ-angers.fr/leria/n) and [ESEO](https://eseo.fr/en) for the initial financing and support. + + +%package -n python3-miditok +Summary: A convenient MIDI tokenizer for Deep Learning networks, with multiple encoding strategies +Provides: python-miditok +BuildRequires: python3-devel +BuildRequires: python3-setuptools +BuildRequires: python3-pip +%description -n python3-miditok +# MidiTok + +Python package to tokenize MIDI music files, presented at the ISMIR 2021 LBD. + + + +[](https://pypi.python.org/pypi/miditok/) +[](https://www.python.org/downloads/release/) +[](https://miditok.readthedocs.io/en/latest/?badge=latest) +[](https://github.com/Natooz/MidiTok/actions/workflows/pytest.yml) +[](https://codecov.io/gh/Natooz/MidiTok) +[](https://github.com/Natooz/MidiTok/blob/main/LICENSE) +[](https://pepy.tech/project/MidiTok) +[](https://github.com/psf/black) + +Using Deep Learning with symbolic music ? MidiTok can take care of converting (tokenizing) your MIDI files into tokens, ready to be fed to models such as Transformer, for any generation, transcription or MIR task. +MidiTok features most known [MIDI tokenizations](https://miditok.readthedocs.io/en/latest/tokenizations.html) (e.g. [REMI](https://arxiv.org/abs/2002.00212), [Compound Word](https://arxiv.org/abs/2101.02402)...), and is built around the idea that they all share common parameters and methods. It supports [Byte Pair Encoding (BPE)](https://arxiv.org/abs/2301.11975) and data augmentation. + +**Documentation:** [miditok.readthedocs.com](https://miditok.readthedocs.io/en/latest/index.html) + +## Install + +```shell +pip install miditok +``` +MidiTok uses [MIDIToolkit](https://github.com/YatingMusic/miditoolkit), which itself uses [Mido](https://github.com/mido/mido) to read and write MIDI files, and BPE is backed by [Hugging Face 🤗tokenizers](https://github.com/huggingface/tokenizers) for super-fast encoding. + +## Usage example + +The most basic and useful methods are summarized here. And [here](colab-notebooks/Full_Example_HuggingFace_GPT2_Transformer.ipynb) is a simple notebook example showing how to use Hugging Face models to generate music, with MidiTok taking care of tokenizing MIDIs. + +```python +from miditok import REMI +from miditok.utils import get_midi_programs +from miditoolkit import MidiFile +from pathlib import Path + +# Creates the tokenizer and loads a MIDI +tokenizer = REMI() # using the default parameters, read the documentation to customize your tokenizer +midi = MidiFile('path/to/your_midi.mid') + +# Converts MIDI to tokens, and back to a MIDI +tokens = tokenizer(midi) # calling it will automatically detect MIDIs, paths and tokens before the conversion +converted_back_midi = tokenizer(tokens, get_midi_programs(midi)) # PyTorch / Tensorflow / Numpy tensors supported + +# Converts MIDI files to tokens saved as JSON files +midi_paths = list(Path("path", "to", "dataset").glob("**/*.mid")) +data_augmentation_offsets = [2, 1, 1] # data augmentation on 2 pitch octaves, 1 velocity and 1 duration values +tokenizer.tokenize_midi_dataset(midi_paths, Path("path", "to", "tokens_noBPE"), + data_augment_offsets=data_augmentation_offsets) + +# Constructs the vocabulary with BPE, from the tokenized files +tokenizer.learn_bpe( + vocab_size=500, + tokens_paths=list(Path("path", "to", "tokens_noBPE").glob("**/*.json")), + start_from_empty_voc=False, +) + +# Saving our tokenizer, to retrieve it back later with the load_params method +tokenizer.save_params(Path("path", "to", "save", "tokenizer")) + +# Converts the tokenized musics into tokens with BPE +tokenizer.apply_bpe_to_dataset(Path('path', 'to', 'tokens_noBPE'), Path('path', 'to', 'tokens_BPE')) +``` + +## Tokenizations + +MidiTok implements the tokenizations: (links to original papers) +* [REMI](https://dl.acm.org/doi/10.1145/3394171.3413671) +* [REMI+](https://openreview.net/forum?id=NyR8OZFHw6i) +* [MIDI-Like](https://link.springer.com/article/10.1007/s00521-018-3758-9) +* [TSD](https://arxiv.org/abs/2301.11975) +* [Structured](https://arxiv.org/abs/2107.05944) +* [CPWord](https://ojs.aaai.org/index.php/AAAI/article/view/16091) +* [Octuple](https://aclanthology.org/2021.findings-acl.70) +* [MuMIDI](https://dl.acm.org/doi/10.1145/3394171.3413721) +* [MMM](https://arxiv.org/abs/2008.06048) + +You can find short presentations in the [documentation](https://miditok.readthedocs.io/en/latest/tokenizations.html). + +## Limitations + +Tokenizations using Bar tokens (REMI, Compound Word and MuMIDI) **only considers a 4/x time signature** for now. This means that each bar is considered covering 4 beats. +REMI+ and Octuple support it. + +## Contributions + +Contributions are gratefully welcomed, feel free to open an issue or send a PR if you want to add a tokenization or speed up the code. You can read the [contribution guide](CONTRIBUTING.md) for details. + +### Todos + +* Extend Time Signature to all tokenizations +* Control Change messages +* Option to represent pitch values as pitch intervals, as [it seems to improve performances](https://ismir2022program.ismir.net/lbd_369.html). +* Speeding up MIDI read / load (Rust / C++ binding) +* Data augmentation on duration values at the MIDI level + +## Citation + +If you use MidiTok for your research, a citation in your manuscript would be gladly appreciated. ❤️ + +[**MidiTok paper**](https://archives.ismir.net/ismir2021/latebreaking/000005.pdf) +```bibtex +@inproceedings{miditok2021, + title={{MidiTok}: A Python package for {MIDI} file tokenization}, + author={Fradet, Nathan and Briot, Jean-Pierre and Chhel, Fabien and El Fallah Seghrouchni, Amal and Gutowski, Nicolas}, + booktitle={Extended Abstracts for the Late-Breaking Demo Session of the 22nd International Society for Music Information Retrieval Conference}, + year={2021}, + url={https://archives.ismir.net/ismir2021/latebreaking/000005.pdf}, +} +``` + +The BibTeX citations of all tokenizations can be found [in the documentation](https://miditok.readthedocs.io/en/latest/citations.html) + + +## Acknowledgments + +Special thanks to all the contributors. +We acknowledge [Aubay](https://blog.aubay.com/index.php/language/en/home/?lang=en), the [LIP6](https://www.lip6.fr/?LANG=en), [LERIA](http://blog.univ-angers.fr/leria/n) and [ESEO](https://eseo.fr/en) for the initial financing and support. + + +%package help +Summary: Development documents and examples for miditok +Provides: python3-miditok-doc +%description help +# MidiTok + +Python package to tokenize MIDI music files, presented at the ISMIR 2021 LBD. + + + +[](https://pypi.python.org/pypi/miditok/) +[](https://www.python.org/downloads/release/) +[](https://miditok.readthedocs.io/en/latest/?badge=latest) +[](https://github.com/Natooz/MidiTok/actions/workflows/pytest.yml) +[](https://codecov.io/gh/Natooz/MidiTok) +[](https://github.com/Natooz/MidiTok/blob/main/LICENSE) +[](https://pepy.tech/project/MidiTok) +[](https://github.com/psf/black) + +Using Deep Learning with symbolic music ? MidiTok can take care of converting (tokenizing) your MIDI files into tokens, ready to be fed to models such as Transformer, for any generation, transcription or MIR task. +MidiTok features most known [MIDI tokenizations](https://miditok.readthedocs.io/en/latest/tokenizations.html) (e.g. [REMI](https://arxiv.org/abs/2002.00212), [Compound Word](https://arxiv.org/abs/2101.02402)...), and is built around the idea that they all share common parameters and methods. It supports [Byte Pair Encoding (BPE)](https://arxiv.org/abs/2301.11975) and data augmentation. + +**Documentation:** [miditok.readthedocs.com](https://miditok.readthedocs.io/en/latest/index.html) + +## Install + +```shell +pip install miditok +``` +MidiTok uses [MIDIToolkit](https://github.com/YatingMusic/miditoolkit), which itself uses [Mido](https://github.com/mido/mido) to read and write MIDI files, and BPE is backed by [Hugging Face 🤗tokenizers](https://github.com/huggingface/tokenizers) for super-fast encoding. + +## Usage example + +The most basic and useful methods are summarized here. And [here](colab-notebooks/Full_Example_HuggingFace_GPT2_Transformer.ipynb) is a simple notebook example showing how to use Hugging Face models to generate music, with MidiTok taking care of tokenizing MIDIs. + +```python +from miditok import REMI +from miditok.utils import get_midi_programs +from miditoolkit import MidiFile +from pathlib import Path + +# Creates the tokenizer and loads a MIDI +tokenizer = REMI() # using the default parameters, read the documentation to customize your tokenizer +midi = MidiFile('path/to/your_midi.mid') + +# Converts MIDI to tokens, and back to a MIDI +tokens = tokenizer(midi) # calling it will automatically detect MIDIs, paths and tokens before the conversion +converted_back_midi = tokenizer(tokens, get_midi_programs(midi)) # PyTorch / Tensorflow / Numpy tensors supported + +# Converts MIDI files to tokens saved as JSON files +midi_paths = list(Path("path", "to", "dataset").glob("**/*.mid")) +data_augmentation_offsets = [2, 1, 1] # data augmentation on 2 pitch octaves, 1 velocity and 1 duration values +tokenizer.tokenize_midi_dataset(midi_paths, Path("path", "to", "tokens_noBPE"), + data_augment_offsets=data_augmentation_offsets) + +# Constructs the vocabulary with BPE, from the tokenized files +tokenizer.learn_bpe( + vocab_size=500, + tokens_paths=list(Path("path", "to", "tokens_noBPE").glob("**/*.json")), + start_from_empty_voc=False, +) + +# Saving our tokenizer, to retrieve it back later with the load_params method +tokenizer.save_params(Path("path", "to", "save", "tokenizer")) + +# Converts the tokenized musics into tokens with BPE +tokenizer.apply_bpe_to_dataset(Path('path', 'to', 'tokens_noBPE'), Path('path', 'to', 'tokens_BPE')) +``` + +## Tokenizations + +MidiTok implements the tokenizations: (links to original papers) +* [REMI](https://dl.acm.org/doi/10.1145/3394171.3413671) +* [REMI+](https://openreview.net/forum?id=NyR8OZFHw6i) +* [MIDI-Like](https://link.springer.com/article/10.1007/s00521-018-3758-9) +* [TSD](https://arxiv.org/abs/2301.11975) +* [Structured](https://arxiv.org/abs/2107.05944) +* [CPWord](https://ojs.aaai.org/index.php/AAAI/article/view/16091) +* [Octuple](https://aclanthology.org/2021.findings-acl.70) +* [MuMIDI](https://dl.acm.org/doi/10.1145/3394171.3413721) +* [MMM](https://arxiv.org/abs/2008.06048) + +You can find short presentations in the [documentation](https://miditok.readthedocs.io/en/latest/tokenizations.html). + +## Limitations + +Tokenizations using Bar tokens (REMI, Compound Word and MuMIDI) **only considers a 4/x time signature** for now. This means that each bar is considered covering 4 beats. +REMI+ and Octuple support it. + +## Contributions + +Contributions are gratefully welcomed, feel free to open an issue or send a PR if you want to add a tokenization or speed up the code. You can read the [contribution guide](CONTRIBUTING.md) for details. + +### Todos + +* Extend Time Signature to all tokenizations +* Control Change messages +* Option to represent pitch values as pitch intervals, as [it seems to improve performances](https://ismir2022program.ismir.net/lbd_369.html). +* Speeding up MIDI read / load (Rust / C++ binding) +* Data augmentation on duration values at the MIDI level + +## Citation + +If you use MidiTok for your research, a citation in your manuscript would be gladly appreciated. ❤️ + +[**MidiTok paper**](https://archives.ismir.net/ismir2021/latebreaking/000005.pdf) +```bibtex +@inproceedings{miditok2021, + title={{MidiTok}: A Python package for {MIDI} file tokenization}, + author={Fradet, Nathan and Briot, Jean-Pierre and Chhel, Fabien and El Fallah Seghrouchni, Amal and Gutowski, Nicolas}, + booktitle={Extended Abstracts for the Late-Breaking Demo Session of the 22nd International Society for Music Information Retrieval Conference}, + year={2021}, + url={https://archives.ismir.net/ismir2021/latebreaking/000005.pdf}, +} +``` + +The BibTeX citations of all tokenizations can be found [in the documentation](https://miditok.readthedocs.io/en/latest/citations.html) + + +## Acknowledgments + +Special thanks to all the contributors. +We acknowledge [Aubay](https://blog.aubay.com/index.php/language/en/home/?lang=en), the [LIP6](https://www.lip6.fr/?LANG=en), [LERIA](http://blog.univ-angers.fr/leria/n) and [ESEO](https://eseo.fr/en) for the initial financing and support. + + +%prep +%autosetup -n miditok-2.0.6 + +%build +%py3_build + +%install +%py3_install +install -d -m755 %{buildroot}/%{_pkgdocdir} +if [ -d doc ]; then cp -arf doc %{buildroot}/%{_pkgdocdir}; fi +if [ -d docs ]; then cp -arf docs %{buildroot}/%{_pkgdocdir}; fi +if [ -d example ]; then cp -arf example %{buildroot}/%{_pkgdocdir}; fi +if [ -d examples ]; then cp -arf examples %{buildroot}/%{_pkgdocdir}; fi +pushd %{buildroot} +if [ -d usr/lib ]; then + find usr/lib -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/lib64 ]; then + find usr/lib64 -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/bin ]; then + find usr/bin -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +if [ -d usr/sbin ]; then + find usr/sbin -type f -printf "\"/%h/%f\"\n" >> filelist.lst +fi +touch doclist.lst +if [ -d usr/share/man ]; then + find usr/share/man -type f -printf "\"/%h/%f.gz\"\n" >> doclist.lst +fi +popd +mv %{buildroot}/filelist.lst . +mv %{buildroot}/doclist.lst . + +%files -n python3-miditok -f filelist.lst +%dir %{python3_sitelib}/* + +%files help -f doclist.lst +%{_docdir}/* + +%changelog +* Fri Jun 09 2023 Python_Bot <Python_Bot@openeuler.org> - 2.0.6-1 +- Package Spec generated |