MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
Dingyao Yu, Kaitao Song, Peiling Lu, Tianyu He, Xu Tan, Wei Ye, Shikun, Zhang, Jiang Bian

TL;DR
MusicAgent is an AI system that leverages large language models to organize, automate, and simplify diverse music processing tasks, enabling users to focus on creativity without managing complex tools.
Contribution
The paper introduces MusicAgent, a novel system that integrates multiple music tools and uses LLMs to automate task decomposition and tool invocation for music understanding and generation.
Findings
Successfully integrates tools from multiple sources.
Automates task decomposition and tool invocation.
Enhances user experience by simplifying AI-music interactions.
Abstract
AI-empowered music processing is a diverse field that encompasses dozens of tasks, ranging from generation tasks (e.g., timbre synthesis) to comprehension tasks (e.g., music classification). For developers and amateurs, it is very difficult to grasp all of these task to satisfy their requirements in music processing, especially considering the huge differences in the representations of music data and the model applicability across platforms among various tasks. Consequently, it is necessary to build a system to organize and integrate these tasks, and thus help practitioners to automatically analyze their demand and call suitable tools as solutions to fulfill their requirements. Inspired by the recent success of large language models (LLMs) in task automation, we develop a system, named MusicAgent, which integrates numerous music-related tools and an autonomous workflow to address user…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Topic Modeling · Speech Recognition and Synthesis
