
TL;DR
This paper introduces a zero-shot symbolic music editing method using large language models and a novel notation, enabling complex drum groove modifications based on natural language instructions.
Contribution
It presents a new text-based notation and a benchmark for zero-shot symbolic music editing, along with an automated evaluation framework for musical constraints.
Findings
Top model achieved 68% success rate on automated tests.
Automated tests correlate highly with professional musicians' judgments.
The approach enables scalable, data-efficient music editing with LLMs.
Abstract
While recent advancements in AI music generation have predominantly focused on direct audio synthesis, these systems suffer from inherent rigidity, limiting their utility for professional music producers who require granular, highly malleable creative control. Symbolic music (e.g., MIDI) resolves this constraint by providing editable note-level parameters, yet the natural progression to instruction-driven symbolic music editing remains critically under-explored due to a severe scarcity of paired instruction-MIDI datasets. In this paper, we bypass this data bottleneck by formalizing zero-shot symbolic music editing as a structured reasoning task. We introduce a novel text-based "drumroll" notation that translates musical mechanics into a spatial, syntax-driven grid, empowering off-the-shelf Large Language Models (LLMs) to logically deduce and apply complex edits to drum grooves using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
