Systematic Evaluation of Online Speaker Diarization Systems Regarding   their Latency

Roman Aperdannier; Sigurd Schacht; Alexander Piazza

arXiv:2407.04293·cs.CL·July 8, 2024

Systematic Evaluation of Online Speaker Diarization Systems Regarding their Latency

Roman Aperdannier, Sigurd Schacht, Alexander Piazza

PDF

Open Access

TL;DR

This paper systematically evaluates the latency of various online speaker diarization systems on identical hardware and data, highlighting the performance of DIART and FS-EEND systems.

Contribution

It provides the first comparative analysis of online diarization systems focusing on latency, using standardized hardware and datasets.

Findings

01

DIART-pipeline with specific models achieves lowest latency

02

FS-EEND system demonstrates comparable low latency

03

No prior research compares online diarization systems based on latency

Abstract

In this paper, different online speaker diarization systems are evaluated on the same hardware with the same test data with regard to their latency. The latency is the time span from audio input to the output of the corresponding speaker label. As part of the evaluation, various model combinations within the DIART framework, a diarization system based on the online clustering algorithm UIS-RNN-SML, and the end-to-end online diarization system FS-EEND are compared. The lowest latency is achieved for the DIART-pipeline with the embedding model pyannote/embedding and the segmentation model pyannote/segmentation. The FS-EEND system shows a similarly good latency. In general there is currently no published research that compares several online diarization systems in terms of their latency. This makes this work even more relevant.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis