Musical Source Separation Bake-Off: Comparing Objective Metrics with Human Perception

Noah Jaffe; John Ashley Burgoyne

arXiv:2507.06917·eess.AS·October 1, 2025·WASPAA

Musical Source Separation Bake-Off: Comparing Objective Metrics with Human Perception

Noah Jaffe, John Ashley Burgoyne

PDF

Open Access

TL;DR

This study evaluates how well various objective metrics predict human perception of music source separation quality, revealing that no single metric is universally reliable and emphasizing the importance of stem-specific evaluation methods.

Contribution

It provides a large-scale listener dataset and compares multiple objective metrics, highlighting their strengths and limitations in predicting human perception across different music stems.

Findings

01

SDR best predicts vocal quality

02

SI-SAR better correlates with perception for drums and bass

03

FAD with CLAP-LAION-music performs well for drums and bass

Abstract

Music source separation aims to extract individual sound sources (e.g., vocals, drums, guitar) from a mixed music recording. However, evaluating the quality of separated audio remains challenging, as commonly used metrics like the source-to-distortion ratio (SDR) do not always align with human perception. In this study, we conducted a large-scale listener evaluation on the MUSDB18 test set, collecting approximately 30 ratings per track from seven distinct listener groups. We compared several objective energy-ratio metrics, including legacy measures (BSSEval v4, SI-SDR variants), and embedding-based alternatives (Frechet Audio Distance using CLAP-LAION-music, EnCodec, VGGish, Wave2Vec2, and HuBERT). While SDR remains the best-performing metric for vocal estimates, our results show that the scale-invariant signal-to-artifacts ratio (SI-SAR) better predicts listener ratings for drums and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Speech and Audio Processing · Music and Audio Processing