Deep Clustering and Conventional Networks for Music Separation: Stronger   Together

Yi Luo; Zhuo Chen; John R. Hershey; Jonathan Le Roux; Nima Mesgarani

arXiv:1611.06265·stat.ML·November 30, 2017

Deep Clustering and Conventional Networks for Music Separation: Stronger Together

Yi Luo, Zhuo Chen, John R. Hershey, Jonathan Le Roux, Nima Mesgarani

PDF

TL;DR

This paper demonstrates that combining deep clustering with conventional neural networks significantly improves music source separation performance, leveraging their complementary strengths in a hybrid approach.

Contribution

It introduces a hybrid network that integrates deep clustering and conventional networks, achieving superior separation results in music source separation tasks.

Findings

01

Deep clustering outperforms conventional networks in singing voice separation.

02

The hybrid approach significantly outperforms individual methods.

03

Combining methods leverages their complementary strengths.

Abstract

Deep clustering is the first method to handle general audio separation scenarios with multiple sources of the same type and an arbitrary number of sources, performing impressively in speaker-independent speech separation tasks. However, little is known about its effectiveness in other challenging situations such as music source separation. Contrary to conventional networks that directly estimate the source signals, deep clustering generates an embedding for each time-frequency bin, and separates sources by clustering the bins in the embedding space. We show that deep clustering outperforms conventional networks on a singing voice separation task, in both matched and mismatched conditions, even though conventional networks have the advantage of end-to-end training for best signal approximation, presumably because its more flexible objective engenders better regularization. Since the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.