Low-Latency Online Speaker Diarization with Graph-Based Label Generation
Yucong Zhang, Qinjian Lin, Weiqing Wang, Lin Yang, Xuyang Wang, Junjie, Wang, Ming Li

TL;DR
This paper presents a low-latency online speaker diarization system that combines a novel online clustering algorithm, label matching, and graph-based re-clustering, achieving performance comparable to offline systems on standard datasets.
Contribution
It introduces chkpt-AHC for online clustering and a label matching algorithm to improve real-time speaker diarization performance.
Findings
Outperforms baseline online systems in experiments.
Achieves comparable performance to offline systems.
Reduces time cost significantly with chkpt-AHC.
Abstract
This paper introduces an online speaker diarization system that can handle long-time audio with low latency. We enable Agglomerative Hierarchy Clustering (AHC) to work in an online fashion by introducing a label matching algorithm. This algorithm solves the inconsistency between output labels and hidden labels that are generated each turn. To ensure the low latency in the online setting, we introduce a variant of AHC, namely chkpt-AHC, to cluster the speakers. In addition, we propose a speaker embedding graph to exploit a graph-based re-clustering method, further improving the performance. In the experiment, we evaluate our systems on both DIHARD3 and VoxConverse datasets. The experimental results show that our proposed online systems have better performance than our baseline online system and have comparable performance to our offline systems. We find out that the framework combining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing
