Supervised Hierarchical Clustering using Graph Neural Networks for Speaker Diarization
Prachi Singh, Amrit Kaul, Sriram Ganapathy

TL;DR
This paper introduces E2E-SHARC, a supervised hierarchical clustering method using GNNs for speaker diarization, enabling a single-step, end-to-end process that significantly improves clustering accuracy over traditional methods.
Contribution
The paper presents a novel supervised hierarchical clustering algorithm with GNNs for speaker diarization, allowing end-to-end training and improved performance.
Findings
Achieves 53% relative improvement on AMI dataset.
Achieves 44% relative improvement on Voxconverse dataset.
Enables single-step, end-to-end speaker diarization.
Abstract
Conventional methods for speaker diarization involve windowing an audio file into short segments to extract speaker embeddings, followed by an unsupervised clustering of the embeddings. This multi-step approach generates speaker assignments for each segment. In this paper, we propose a novel Supervised HierArchical gRaph Clustering algorithm (SHARC) for speaker diarization where we introduce a hierarchical structure using Graph Neural Network (GNN) to perform supervised clustering. The supervision allows the model to update the representations and directly improve the clustering performance, thus enabling a single-step approach for diarization. In the proposed work, the input segment embeddings are treated as nodes of a graph with the edge weights corresponding to the similarity scores between the nodes. We also propose an approach to jointly update the embedding extractor and the GNN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
MethodsGraph Neural Network
