Improving Universal Sound Separation Using Sound Classification
Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, Daniel, P. W. Ellis

TL;DR
This paper enhances universal sound separation by leveraging semantic embeddings from sound classifiers, significantly improving separation quality and establishing new state-of-the-art results through iterative modeling.
Contribution
It introduces the use of classifier-derived semantic embeddings to condition separation networks, improving performance in open-domain sound separation tasks.
Findings
Classifier embeddings yield nearly 1 dB SNR improvement.
Iterative models approach oracle performance.
Achieved new state-of-the-art results in universal sound separation.
Abstract
Deep learning approaches have recently achieved impressive performance on both audio source separation and sound classification. Most audio source separation approaches focus only on separating sources belonging to a restricted domain of source classes, such as speech and music. However, recent work has demonstrated the possibility of "universal sound separation", which aims to separate acoustic sources from an open domain, regardless of their class. In this paper, we utilize the semantic information learned by sound classifier networks trained on a vast amount of diverse sounds to improve universal sound separation. In particular, we show that semantic embeddings extracted from a sound classifier can be used to condition a separation network, providing it with useful additional information. This approach is especially useful in an iterative setup, where source estimates from an initial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
