Absolute decision corrupts absolutely: conservative online speaker   diarisation

Youngki Kwon; Hee-Soo Heo; Bong-Jin Lee; You Jin Kim; Jee-weon Jung

arXiv:2211.04768·eess.AS·November 10, 2022

Absolute decision corrupts absolutely: conservative online speaker diarisation

Youngki Kwon, Hee-Soo Heo, Bong-Jin Lee, You Jin Kim, Jee-weon Jung

PDF

Open Access

TL;DR

This paper presents a conservative online speaker diarisation framework that improves robustness by cautious speaker number estimation, utilizing dual buffers, checkpoints, and clustering techniques, achieving state-of-the-art results across multiple datasets.

Contribution

The proposed framework introduces a novel conservative approach to online speaker diarisation, including dynamic speaker count adjustment and clustering-based label matching, enhancing robustness and performance.

Findings

01

Achieves state-of-the-art results on DIHARD 2 and 3 datasets.

02

Demonstrates competitive performance on AMI and VoxConverse datasets.

03

Lightweight system with effective real-time speaker diarisation.

Abstract

Our focus lies in developing an online speaker diarisation framework which demonstrates robust performance across diverse domains. In online speaker diarisation, outputs generated in real-time are irreversible, and a few misjudgements in the early phase of an input session can lead to catastrophic results. We hypothesise that cautiously increasing the number of estimated speakers is of paramount importance among many other factors. Thus, our proposed framework includes decreasing the number of speakers by one when the system judges that an increase in the past was faulty. We also adopt dual buffers, checkpoints and centroids, where checkpoints are combined with silhouette coefficients to estimate the number of speakers and centroids represent speakers. Again, we believe that more than one centroid can be generated from one speaker. Thus we design a clustering-based label matching…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems

MethodsTest