Large Language Models' Internal Perception of Symbolic Music
Andrew Shin, Kunitake Kaneko

TL;DR
This study explores how large language models implicitly understand and generate symbolic music by analyzing their internal representations and evaluating their performance on music recognition and generation tasks.
Contribution
It introduces a novel approach to assess LLMs' musical understanding by generating MIDI data from text prompts and training neural networks on this data, revealing their capabilities and limitations.
Findings
LLMs can infer basic musical structures from text.
Neural networks trained on LLM-generated MIDI perform well in classification.
LLMs show potential but lack explicit musical context understanding.
Abstract
Large language models (LLMs) excel at modeling relationships between strings in natural language and have shown promise in extending to other symbolic domains like coding or mathematics. However, the extent to which they implicitly model symbolic music remains underexplored. This paper investigates how LLMs represent musical concepts by generating symbolic music data from textual prompts describing combinations of genres and styles, and evaluating their utility through recognition and generation tasks. We produce a dataset of LLM-generated MIDI files without relying on explicit musical training. We then train neural networks entirely on this LLM-generated MIDI dataset and perform genre and style classification as well as melody completion, benchmarking their performance against established models. Our results demonstrate that LLMs can infer rudimentary musical structures and temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Computational and Text Analysis Methods
