Improving Source Separation via Multi-Speaker Representations
Jeroen Zegers, Hugo Van hamme

TL;DR
This paper explores multi-speaker representations to improve neural network-based source separation, demonstrating that speaker adaptation enhances performance but the network struggles to extract speaker info autonomously.
Contribution
It introduces a novel multi-speaker adaptation method for neural source separation, addressing the challenge of unknown speaker identities.
Findings
Blind multi-speaker adaptation improves separation results.
Neural networks struggle to extract speaker information without adaptation.
Alternating estimation of source signals and speaker features is effective.
Abstract
Lately there have been novel developments in deep learning towards solving the cocktail party problem. Initial results are very promising and allow for more research in the domain. One technique that has not yet been explored in the neural network approach to this task is speaker adaptation. Intuitively, information on the speakers that we are trying to separate seems fundamentally important for the speaker separation task. However, retrieving this speaker information is challenging since the speaker identities are not known a priori and multiple speakers are simultaneously active. There is thus some sort of chicken and egg problem. To tackle this, source signals and i-vectors are estimated alternately. We show that blind multi-speaker adaptation improves the results of the network and that (in our case) the network is not capable of adequately retrieving this useful speaker information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
