From Modular to End-to-End Speaker Diarization

Federico Landini

arXiv:2407.08752·eess.AS·July 15, 2024

From Modular to End-to-End Speaker Diarization

Federico Landini

PDF

Open Access

TL;DR

This paper reviews the evolution from modular to end-to-end speaker diarization, introducing new models and data generation techniques that improve handling overlapped speech and multiple speakers.

Contribution

It presents a new EEND-based model called DiaPer, compares it with VBx, and introduces a synthetic data generation method for training neural diarization models.

Findings

01

DiaPer outperforms EEND-EDA with many speakers and overlaps.

02

Synthetic data improves neural diarization training.

03

VBx remains effective with clustering approaches.

Abstract

Speaker diarization is usually referred to as the task that determines ``who spoke when'' in a recording. Until a few years ago, all competitive approaches were modular. Systems based on this framework reached state-of-the-art performance in most scenarios but had major difficulties dealing with overlapped speech. More recently, the advent of end-to-end models, capable of dealing with all aspects of speaker diarization with a single model and better performing regarding overlapped speech, has brought high levels of attention. This thesis is framed during a period of co-existence of these two trends. We describe a system based on a Bayesian hidden Markov model used to cluster x-vectors (speaker embeddings obtained with a neural network), known as VBx, which has shown remarkable performance on different datasets and challenges. We comment on its advantages and limitations and evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsEnd-to-End Neural Diarization