Improving Source Separation via Multi-Speaker Representations

Jeroen Zegers; Hugo Van hamme

arXiv:1708.08740·cs.SD·August 30, 2017

Improving Source Separation via Multi-Speaker Representations

Jeroen Zegers, Hugo Van hamme

PDF

TL;DR

This paper explores multi-speaker representations to improve neural network-based source separation, demonstrating that speaker adaptation enhances performance but the network struggles to extract speaker info autonomously.

Contribution

It introduces a novel multi-speaker adaptation method for neural source separation, addressing the challenge of unknown speaker identities.

Findings

01

Blind multi-speaker adaptation improves separation results.

02

Neural networks struggle to extract speaker information without adaptation.

03

Alternating estimation of source signals and speaker features is effective.

Abstract

Lately there have been novel developments in deep learning towards solving the cocktail party problem. Initial results are very promising and allow for more research in the domain. One technique that has not yet been explored in the neural network approach to this task is speaker adaptation. Intuitively, information on the speakers that we are trying to separate seems fundamentally important for the speaker separation task. However, retrieving this speaker information is challenging since the speaker identities are not known a priori and multiple speakers are simultaneously active. There is thus some sort of chicken and egg problem. To tackle this, source signals and i-vectors are estimated alternately. We show that blind multi-speaker adaptation improves the results of the network and that (in our case) the network is not capable of adequately retrieving this useful speaker information…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.