DisMix: Disentangling Mixtures of Musical Instruments for Source-level Pitch and Timbre Manipulation
Yin-Jyun Luo, Kin Wai Cheuk, Woosung Choi, Toshimitsu Uesaka, Keisuke, Toyama, Koichi Saito, Chieh-Hsin Lai, Yuhta Takida, Wei-Hsiang Liao, Simon, Dixon, Yuki Mitsufuji

TL;DR
DisMix is a novel generative framework that disentangles pitch and timbre in multi-instrument music mixtures, enabling manipulation and synthesis of new instrument combinations and musical attributes.
Contribution
It introduces a modular approach for source-level pitch and timbre disentanglement in multi-instrument music, filling a gap in existing single-instrument focused methods.
Findings
Successfully disentangles pitch and timbre in multi-instrument mixtures
Enables manipulation of instrument attributes and creation of novel instrument combinations
Effective on both simple chords and complex chorale datasets
Abstract
Existing work on pitch and timbre disentanglement has been mostly focused on single-instrument music audio, excluding the cases where multiple instruments are presented. To fill the gap, we propose DisMix, a generative framework in which the pitch and timbre representations act as modular building blocks for constructing the melody and instrument of a source, and the collection of which forms a set of per-instrument latent representations underlying the observed mixture. By manipulating the representations, our model samples mixtures with novel combinations of pitch and timbre of the constituent instruments. We can jointly learn the disentangled pitch-timbre representations and a latent diffusion transformer that reconstructs the mixture conditioned on the set of source-level representations. We evaluate the model using both a simple dataset of isolated chords and a realistic four-part…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing · Computer Graphics and Visualization Techniques
MethodsSparse Evolutionary Training · Diffusion
