MixAssist: An Audio-Language Dataset for Co-Creative AI Assistance in Music Mixing

Michael Clemens; Ana Marasovi\'c

arXiv:2507.06329·cs.SD·July 10, 2025

MixAssist: An Audio-Language Dataset for Co-Creative AI Assistance in Music Mixing

Michael Clemens, Ana Marasovi\'c

PDF

Open Access

TL;DR

MixAssist introduces a novel audio-language dataset capturing real-world collaborative music mixing dialogues, enabling AI models to better understand and assist in co-creative music production processes.

Contribution

The paper presents MixAssist, the first dataset of multi-turn audio-language dialogues in music mixing, facilitating the development of AI assistants that support collaborative and instructional music production.

Findings

01

Fine-tuned models like Qwen-Audio outperform others in generating relevant mixing advice.

02

Automated and human evaluations show promising results for AI assistance in music mixing.

03

MixAssist enables training models to understand complex, real-world music production dialogues.

Abstract

While AI presents significant potential for enhancing music mixing and mastering workflows, current research predominantly emphasizes end-to-end automation or generation, often overlooking the collaborative and instructional dimensions vital for co-creative processes. This gap leaves artists, particularly amateurs seeking to develop expertise, underserved. To bridge this, we introduce MixAssist, a novel audio-language dataset capturing the situated, multi-turn dialogue between expert and amateur music producers during collaborative mixing sessions. Comprising 431 audio-grounded conversational turns derived from 7 in-depth sessions involving 12 producers, MixAssist provides a unique resource for training and evaluating audio-language models that can comprehend and respond to the complexities of real-world music production dialogues. Our evaluations, including automated LLM-as-a-judge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Music and Audio Processing · AI in Service Interactions