A Review of Common Online Speaker Diarization Methods

Roman Aperdannier; Sigurd Schacht; Alexander Piazza

arXiv:2406.14464·cs.SD·June 21, 2024

A Review of Common Online Speaker Diarization Methods

Roman Aperdannier, Sigurd Schacht, Alexander Piazza

PDF

Open Access

TL;DR

This paper reviews online speaker diarization, discussing its history, taxonomy, datasets, methods, and challenges, emphasizing the need for low-latency speaker labeling in real-time audio processing.

Contribution

It provides a comprehensive overview of online speaker diarization methods, datasets, and challenges, highlighting areas for future research and development.

Findings

01

Summarizes the evolution of online speaker diarization techniques.

02

Identifies key datasets used for training and evaluation.

03

Outlines unresolved challenges in achieving low-latency diarization.

Abstract

Speaker diarization provides the answer to the question "who spoke when?" for an audio file. This information can be used to complete audio transcripts for further processing steps. Most speaker diarization systems assume that the audio file is available as a whole. However, there are scenarios in which the speaker labels are needed immediately after the arrival of an audio segment. Speaker diarization with a correspondingly low latency is referred to as online speaker diarization. This paper provides an overview. First the history of online speaker diarization is briefly presented. Next a taxonomy and datasets for training and evaluation are given. In the sections that follow, online diarization methods and systems are discussed in detail. This paper concludes with the presentation of challenges that still need to be solved by future research in the field of online speaker diarization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing