Speaker diarization with session-level speaker embedding refinement using graph neural networks
Jixuan Wang, Xiong Xiao, Jian Wu, Ranjani Ramamurthy, Frank Rudzicz,, Michael Brudno

TL;DR
This paper introduces a novel graph neural network approach to refine speaker embeddings locally within sessions, significantly improving diarization accuracy and achieving state-of-the-art results on benchmark datasets.
Contribution
First application of GNNs for session-level speaker embedding refinement, enhancing speaker separation in diarization systems.
Findings
Refined embeddings outperform original embeddings in clustering accuracy.
System achieves state-of-the-art results on NIST SRE 2000 CALLHOME.
Significant improvements on both simulated and real meeting data.
Abstract
Deep speaker embedding models have been commonly used as a building block for speaker diarization systems; however, the speaker embedding model is usually trained according to a global loss defined on the training data, which could be sub-optimal for distinguishing speakers locally in a specific meeting session. In this work we present the first use of graph neural networks (GNNs) for the speaker diarization problem, utilizing a GNN to refine speaker embeddings locally using the structural information between speech segments inside each session. The speaker embeddings extracted by a pre-trained model are remapped into a new embedding space, in which the different speakers within a single session are better separated. The model is trained for linkage prediction in a supervised manner by minimizing the difference between the affinity matrix constructed by the refined embeddings and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
MethodsSpectral Clustering
