Deep Self-Supervised Hierarchical Clustering for Speaker Diarization

Prachi Singh; Sriram Ganapathy

arXiv:2008.03960·eess.AS·April 7, 2021·Interspeech

Deep Self-Supervised Hierarchical Clustering for Speaker Diarization

Prachi Singh, Sriram Ganapathy

PDF

1 Repo

TL;DR

This paper introduces a self-supervised hierarchical clustering method that jointly learns speaker representations and clusters, significantly improving diarization accuracy over traditional methods.

Contribution

It presents a novel joint clustering and representation learning algorithm based on self-supervision, enhancing speaker diarization performance.

Findings

01

29% relative improvement over AHC with cosine similarity

02

10% relative improvement over state-of-the-art PLDA-based system

03

Effective integration of clustering with representation learning

Abstract

The state-of-the-art speaker diarization systems use agglomerative hierarchical clustering (AHC) which performs the clustering of previously learned neural embeddings. While the clustering approach attempts to identify speaker clusters, the AHC algorithm does not involve any further learning. In this paper, we propose a novel algorithm for hierarchical clustering which combines the speaker clustering along with a representation learning framework. The proposed approach is based on principles of self-supervised learning where the self-supervision is derived from the clustering algorithm. The representation learning network is trained with a regularized triplet loss using the clustering solution at the current step while the clustering algorithm uses the deep embeddings from the representation learning step. By combining the self-supervision based representation learning along with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iiscleap/self_supervised_AHC
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.