Automatic Music Mixing using a Generative Model of Effect Embeddings
Eloi Moliner, Marco A. Mart\'inez-Ram\'irez, Junghyun Koo, Wei-Hsiang Liao, Kin Wai Cheuk, Joan Serr\`a, Vesa V\"alim\"aki, Yuki Mitsufuji

TL;DR
This paper introduces MEGAMI, a generative model for automatic music mixing that captures the multiple valid mixing solutions, improving over existing deterministic systems and approaching human-level quality.
Contribution
The paper presents MEGAMI, a novel generative framework that models the distribution of professional mixes conditioned on unprocessed tracks, handling unlabeled data and multiple solutions.
Findings
Outperforms existing methods on distributional metrics.
Achieves near human-level quality in listening tests.
Handles diverse musical genres effectively.
Abstract
Music mixing involves combining individual tracks into a cohesive mixture, a task characterized by subjectivity where multiple valid solutions exist for the same input. Existing automatic mixing systems treat this task as a deterministic regression problem, thus ignoring this multiplicity of solutions. Here we introduce MEGAMI (Multitrack Embedding Generative Auto MIxing), a generative framework that models the conditional distribution of professional mixes given unprocessed tracks. MEGAMI uses a track-agnostic effects processor conditioned on per-track generated embeddings, handles arbitrary unlabeled tracks through a permutation-equivariant architecture, and enables training on both dry and wet recordings via domain adaptation. Our objective evaluation using distributional metrics shows consistent improvements over existing methods, while listening tests indicate performances…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing · Generative Adversarial Networks and Image Synthesis
