Do Music Generation Models Encode Music Theory?
Megan Wei, Michael Freeman, Chris Donahue, Chen Sun

TL;DR
This paper investigates whether music foundation models encode fundamental music theory concepts by introducing SynTheory, a synthetic dataset, and probing models like Jukebox and MusicGen, revealing that these concepts are detectable within their internal representations.
Contribution
The paper introduces SynTheory, a comprehensive synthetic dataset for music theory concepts, and presents a framework to probe and analyze how music foundation models encode these concepts internally.
Findings
Music theory concepts are detectable within foundation models.
Model size and layer influence the degree of encoding.
Encoding strength varies across different music theory elements.
Abstract
Music foundation models possess impressive music generation capabilities. When people compose music, they may infuse their understanding of music into their work, by using notes and intervals to craft melodies, chords to build progressions, and tempo to create a rhythmic feel. To what extent is this true of music generation models? More specifically, are fundamental Western music theory concepts observable within the "inner workings" of these models? Recent work proposed leveraging latent audio representations from music generation models towards music information retrieval tasks (e.g. genre classification, emotion recognition), which suggests that high-level musical characteristics are encoded within these models. However, probing individual music theory concepts (e.g. tempo, pitch class, chord quality) remains under-explored. Thus, we introduce SynTheory, a synthetic MIDI and audio…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing
