Improved symbolic drum style classification with grammar-based hierarchical representations
L\'eo G\'er\'e (CNAM Paris, CEDRIC - VERTIGO), Philippe Rigaux (CEDRIC, - VERTIGO, CNAM Paris), Nicolas Audebert (CEDRIC - VERTIGO, CNAM, IGN,, LaSTIG)

TL;DR
This paper introduces a grammar-based hierarchical representation of MIDI data that enhances deep learning models' ability to classify musical styles, especially drumming, by capturing high-level rhythmic information more effectively.
Contribution
It proposes a novel tree-based MIDI representation using a context-free grammar, improving style classification accuracy and efficiency over traditional tokenization methods.
Findings
Grammar-based representation outperforms generic tokenization.
Enhanced rhythmic encoding improves classification accuracy.
More compact and parameter-efficient model architecture.
Abstract
Deep learning models have become a critical tool for analysis and classification of musical data. These models operate either on the audio signal, e.g. waveform or spectrogram, or on a symbolic representation, such as MIDI. In the latter, musical information is often reduced to basic features, i.e. durations, pitches and velocities. Most existing works then rely on generic tokenization strategies from classical natural language processing, or matrix representations, e.g. piano roll. In this work, we evaluate how enriched representations of symbolic data can impact deep models, i.e. Transformers and RNN, for music style classification. In particular, we examine representations that explicitly incorporate musical information implicitly present in MIDI-like encodings, such as rhythmic organization, and show that they outperform generic tokenization strategies. We introduce a new tree-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing
