Welcome to MidiTok’s documentation!#
MidiTok is a Python package for MIDI file tokenization, presented at the ISMIR 2021 LBDs (paper). It converts MIDI files to sequences of tokens ready to be fed to sequential Deep Learning models such as Transformers.
MidiTok features most known MIDI Tokenizations, and is built around the idea that they all share common methods. It properly pre-process MIDI files, and supports Byte Pair Encoding (BPE). Github repository
Installation#
pip install miditok
MidiTok uses symusic to read and write MIDI files, and BPE is backed by Hugging Face 🤗tokenizers for super fast encoding.
Citation#
If you use MidiTok for your research, a citation in your manuscript would be gladly appreciated. ❤️
You can also find BibTeX Citations of tokenizations.
@inproceedings{miditok2021,
title={{MidiTok}: A Python package for {MIDI} file tokenization},
author={Fradet, Nathan and Briot, Jean-Pierre and Chhel, Fabien and El Fallah Seghrouchni, Amal and Gutowski, Nicolas},
booktitle={Extended Abstracts for the Late-Breaking Demo Session of the 22nd International Society for Music Information Retrieval Conference},
year={2021},
url={https://archives.ismir.net/ismir2021/latebreaking/000005.pdf},
}
Contents#
- Bases
- Tokens and vocabulary
- Vocabulary
- TokSequence
- MIDI Tokenizer
MIDITokenizer
MIDITokenizer.add_to_vocab()
MIDITokenizer.apply_bpe()
MIDITokenizer.complete_sequence()
MIDITokenizer.decode_bpe()
MIDITokenizer.has_midi_time_signatures_not_in_vocab()
MIDITokenizer.io_format
MIDITokenizer.is_multi_voc
MIDITokenizer.learn_bpe()
MIDITokenizer.len
MIDITokenizer.load_tokens()
MIDITokenizer.midi_to_tokens()
MIDITokenizer.preprocess_midi()
MIDITokenizer.save_params()
MIDITokenizer.save_pretrained()
MIDITokenizer.save_tokens()
MIDITokenizer.special_tokens
MIDITokenizer.special_tokens_ids
MIDITokenizer.token_id_type()
MIDITokenizer.token_ids_of_type()
MIDITokenizer.tokenize_midi_dataset()
MIDITokenizer.tokens_errors()
MIDITokenizer.tokens_to_midi()
MIDITokenizer.vocab
MIDITokenizer.vocab_bpe
- Tokenizer config
- Additional tokens
- Special tokens
- Tokens & TokSequence input / output format
- Magic methods
- Save / Load tokenizer
- Examples
- Tokenizations
- Byte Pair Encoding (BPE)
- Hugging Face hub
- PyTorch Training
- Data augmentation
- Utils methods
compute_ticks_per_bar()
compute_ticks_per_beat()
concat_midis()
convert_ids_tensors_to_list()
detect_chords()
extract_chunk_from_midi()
fix_offsets_overlapping_notes()
get_bars_ticks()
get_beats_ticks()
get_midi_programs()
get_midi_ticks_per_beat()
get_num_notes_per_bar()
merge_midis()
merge_same_program_tracks()
merge_tracks()
merge_tracks_per_class()
num_bar_pos()
remove_duplicated_notes()
split_midi_per_beats()
split_midi_per_ticks()
split_midi_per_tracks()
- Citations