Joint Sound Source Separation and Speaker Recognition
Jeroen Zegers, Hugo Van hamme

TL;DR
This paper presents a novel multichannel NMF-based method that jointly performs sound source separation and speaker recognition for simultaneous speech, outperforming sequential approaches on the CHiME corpus.
Contribution
It introduces an integrated NMF framework that combines source separation and speaker recognition, extending state-of-the-art multichannel NMF techniques.
Findings
Outperforms sequential separation and recognition methods
Effective in handling simultaneous speech in noisy environments
Demonstrates improved accuracy on the CHiME corpus
Abstract
Non-negative Matrix Factorization (NMF) has already been applied to learn speaker characterizations from single or non-simultaneous speech for speaker recognition applications. It is also known for its good performance in (blind) source separation for simultaneous speech. This paper explains how NMF can be used to jointly solve the two problems in a multichannel speaker recognizer for simultaneous speech. It is shown how state-of-the-art multichannel NMF for blind source separation can be easily extended to incorporate speaker recognition. Experiments on the CHiME corpus show that this method outperforms the sequential approach of first applying source separation, followed by speaker recognition that uses state-of-the-art i-vector techniques.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
