Diff-MST: Differentiable Mixing Style Transfer

Soumya Sai Vanka; Christian Steinmetz; Jean-Baptiste Rolland; Joshua; Reiss; George Fazekas

arXiv:2407.08889·eess.AS·July 15, 2024·1 cites

Diff-MST: Differentiable Mixing Style Transfer

Soumya Sai Vanka, Christian Steinmetz, Jean-Baptiste Rolland, Joshua, Reiss, George Fazekas

PDF

Open Access

TL;DR

Diff-MST introduces a differentiable, controllable, and scalable framework for mixing style transfer that produces high-quality audio mixes from raw tracks and a reference, overcoming limitations of previous methods.

Contribution

It presents a novel differentiable mixing console with a transformer controller and style loss, enabling interpretability, controllability, and arbitrary track handling in style transfer.

Findings

01

Produces high-quality mixes comparable to baselines

02

Supports arbitrary number of input tracks without source labels

03

Enables post-hoc adjustments and interpretability

Abstract

Mixing style transfer automates the generation of a multitrack mix for a given set of tracks by inferring production attributes from a reference song. However, existing systems for mixing style transfer are limited in that they often operate only on a fixed number of tracks, introduce artifacts, and produce mixes in an end-to-end fashion, without grounding in traditional audio effects, prohibiting interpretability and controllability. To overcome these challenges, we introduce Diff-MST, a framework comprising a differentiable mixing console, a transformer controller, and an audio production style loss function. By inputting raw tracks and a reference song, our model estimates control parameters for audio effects within a differentiable mixing console, producing high-quality mixes and enabling post-hoc adjustments. Moreover, our architecture supports an arbitrary number of input tracks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Music and Audio Processing

MethodsSparse Evolutionary Training