Diff-MST: Differentiable Mixing Style Transfer
Soumya Sai Vanka, Christian Steinmetz, Jean-Baptiste Rolland, Joshua, Reiss, George Fazekas

TL;DR
Diff-MST introduces a differentiable, controllable, and scalable framework for mixing style transfer that produces high-quality audio mixes from raw tracks and a reference, overcoming limitations of previous methods.
Contribution
It presents a novel differentiable mixing console with a transformer controller and style loss, enabling interpretability, controllability, and arbitrary track handling in style transfer.
Findings
Produces high-quality mixes comparable to baselines
Supports arbitrary number of input tracks without source labels
Enables post-hoc adjustments and interpretability
Abstract
Mixing style transfer automates the generation of a multitrack mix for a given set of tracks by inferring production attributes from a reference song. However, existing systems for mixing style transfer are limited in that they often operate only on a fixed number of tracks, introduce artifacts, and produce mixes in an end-to-end fashion, without grounding in traditional audio effects, prohibiting interpretability and controllability. To overcome these challenges, we introduce Diff-MST, a framework comprising a differentiable mixing console, a transformer controller, and an audio production style loss function. By inputting raw tracks and a reference song, our model estimates control parameters for audio effects within a differentiable mixing console, producing high-quality mixes and enabling post-hoc adjustments. Moreover, our architecture supports an arbitrary number of input tracks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Music and Audio Processing
MethodsSparse Evolutionary Training
