Source Separation of Small Classical Ensembles: Challenges and Opportunities

Gerardo Roa-Dabike; Trevor J. Cox; Jon P. Barker; Michael A. Akeroyd; Scott Bannister; Bruno Fazenda; Jennifer Firth; Simone Graetzer; Alinka Greasley; Rebecca R. Vos; William M. Whitmer

arXiv:2505.17823·eess.AS·May 26, 2025

Source Separation of Small Classical Ensembles: Challenges and Opportunities

Gerardo Roa-Dabike, Trevor J. Cox, Jon P. Barker, Michael A. Akeroyd, Scott Bannister, Bruno Fazenda, Jennifer Firth, Simone Graetzer, Alinka Greasley, Rebecca R. Vos, William M. Whitmer

PDF

TL;DR

This paper explores the challenges of musical source separation in classical ensembles, comparing causal and non-causal deep learning approaches, and introduces a new synthesized dataset to improve separation performance.

Contribution

It presents a new synthesized woodwind ensemble database and evaluates ConvTasNet models for classical music source separation, highlighting the importance of data realism.

Findings

01

Causal and non-causal approaches perform similarly on small real datasets.

02

Significant mismatch exists between synthesized and real recordings affecting separation quality.

03

Future work should focus on collecting more real data or enhancing synthesis realism.

Abstract

Musical (MSS) source separation of western popular music using non-causal deep learning can be very effective. In contrast, MSS for classical music is an unsolved problem. Classical ensembles are harder to separate than popular music because of issues such as the inherent greater variation in the music; the sparsity of recordings with ground truth for supervised training; and greater ambiguity between instruments. The Cadenza project has been exploring MSS for classical music. This is being done so music can be remixed to improve listening experiences for people with hearing loss. To enable the work, a new database of synthesized woodwind ensembles was created to overcome instrumental imbalances in the EnsembleSet. For the MSS, a set of ConvTasNet models was used with each model being trained to extract a string or woodwind instrument. ConvTasNet was chosen because it enabled both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConvolutional time-domain audio separation network · Sparse Evolutionary Training