Joint Sound Source Separation and Speaker Recognition

Jeroen Zegers; Hugo Van hamme

arXiv:1604.08852·cs.SD·May 2, 2016

Joint Sound Source Separation and Speaker Recognition

Jeroen Zegers, Hugo Van hamme

PDF

TL;DR

This paper presents a novel multichannel NMF-based method that jointly performs sound source separation and speaker recognition for simultaneous speech, outperforming sequential approaches on the CHiME corpus.

Contribution

It introduces an integrated NMF framework that combines source separation and speaker recognition, extending state-of-the-art multichannel NMF techniques.

Findings

01

Outperforms sequential separation and recognition methods

02

Effective in handling simultaneous speech in noisy environments

03

Demonstrates improved accuracy on the CHiME corpus

Abstract

Non-negative Matrix Factorization (NMF) has already been applied to learn speaker characterizations from single or non-simultaneous speech for speaker recognition applications. It is also known for its good performance in (blind) source separation for simultaneous speech. This paper explains how NMF can be used to jointly solve the two problems in a multichannel speaker recognizer for simultaneous speech. It is shown how state-of-the-art multichannel NMF for blind source separation can be easily extended to incorporate speaker recognition. Experiments on the CHiME corpus show that this method outperforms the sequential approach of first applying source separation, followed by speaker recognition that uses state-of-the-art i-vector techniques.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.