TL;DR
This paper introduces the GigaMIDI dataset, the largest collection of MIDI files for research, and proposes heuristics to distinguish expressive performances, enabling the creation of a substantial expressive MIDI subset.
Contribution
The paper presents novel heuristics for detecting expressive music performances in MIDI files and creates the largest expressive MIDI dataset to date.
Findings
Heuristics effectively differentiate expressive from non-expressive MIDI tracks.
The curated expressive MIDI dataset contains over 1.6 million tracks.
GigaMIDI is the largest symbolic music dataset available for research.
Abstract
The Musical Instrument Digital Interface (MIDI), introduced in 1983, revolutionized music production by allowing computers and instruments to communicate efficiently. MIDI files encode musical instructions compactly, facilitating convenient music sharing. They benefit Music Information Retrieval (MIR), aiding in research on music understanding, computational musicology, and generative music. The GigaMIDI dataset contains over 1.4 million unique MIDI files, encompassing 1.8 billion MIDI note events and over 5.3 million MIDI tracks. GigaMIDI is currently the largest collection of symbolic music in MIDI format available for research purposes under fair dealing. Distinguishing between non-expressive and expressive MIDI tracks is challenging, as MIDI files do not inherently make this distinction. To address this issue, we introduce a set of innovative heuristics for detecting expressive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training
