UNMIXX: Untangling Highly Correlated Singing Voices Mixtures

Jihoo Jung; Ji-Hoon Kim; Doyeop Kwak; Junwon Lee; Juhan Nam; Joon Son Chung

arXiv:2601.12802·cs.SD·January 21, 2026

UNMIXX: Untangling Highly Correlated Singing Voices Mixtures

Jihoo Jung, Ji-Hoon Kim, Doyeop Kwak, Junwon Lee, Juhan Nam, Joon Son Chung

PDF

Open Access

TL;DR

UNMIXX is a new framework for separating highly correlated singing voices, overcoming data scarcity and correlation challenges with innovative training strategies and attention mechanisms, significantly improving separation quality.

Contribution

It introduces a musically informed mixing strategy, cross-source attention, and a magnitude penalty loss to enhance singing voices separation in highly correlated mixtures.

Findings

01

Achieves over 2.2 dB SDRi improvement over previous methods.

02

Effectively handles highly correlated singing voice mixtures.

03

Addresses data scarcity with realistic training data simulation.

Abstract

We introduce UNMIXX, a novel framework for multiple singing voices separation (MSVS). While related to speech separation, MSVS faces unique challenges: data scarcity and the highly correlated nature of singing voices mixture. To address these issues, we propose UNMIXX with three key components: (1) musically informed mixing strategy to construct highly correlated, music-like mixtures, (2) cross-source attention that drives representations of two singers apart via reverse attention, and (3) magnitude penalty loss penalizing erroneously assigned interfering energy. UNMIXX not only addresses data scarcity by simulating realistic training data, but also excels at separating highly correlated mixtures through cross-source interactions at both the architectural and loss levels. Our extensive experiments demonstrate that UNMIXX greatly enhances performance, with SDRi gains exceeding 2.2 dB…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music Technology and Sound Studies