Subtractive Training for Music Stem Insertion using Latent Diffusion Models
Ivan Villa-Renteria, Mason L. Wang, Zachary Shah, Zhe Li, Soohyun Kim,, Neelesh Ramachandran, Mert Pilanci

TL;DR
This paper introduces Subtractive Training, a novel method using latent diffusion models to synthesize individual musical instrument stems conditioned on existing tracks and textual instructions, enabling style control and extension to MIDI formats.
Contribution
It presents a new training approach that combines dataset manipulation and language guidance to generate and control specific instrument stems in music.
Findings
Effective generation of realistic drum stems blending with existing tracks.
Ability to control instrument style through text instructions.
Successful extension of technique to MIDI format for various instruments.
Abstract
We present Subtractive Training, a simple and novel method for synthesizing individual musical instrument stems given other instruments as context. This method pairs a dataset of complete music mixes with 1) a variant of the dataset lacking a specific stem, and 2) LLM-generated instructions describing how the missing stem should be reintroduced. We then fine-tune a pretrained text-to-audio diffusion model to generate the missing instrument stem, guided by both the existing stems and the text instruction. Our results demonstrate Subtractive Training's efficacy in creating authentic drum stems that seamlessly blend with the existing tracks. We also show that we can use the text instruction to control the generation of the inserted stem in terms of rhythm, dynamics, and genre, allowing us to modify the style of a single instrument in a full song while keeping the remaining instruments the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing
MethodsDiffusion
