Dissecting the Segmentation Model of End-to-End Diarization with Vector Clustering

Alexis Plaquet; Naohiro Tawara; Marc Delcroix; Shota Horiguchi; Atsushi Ando; Shoko Araki; Herv\'e Bredin

arXiv:2506.11605·cs.SD·June 16, 2025

Dissecting the Segmentation Model of End-to-End Diarization with Vector Clustering

Alexis Plaquet, Naohiro Tawara, Marc Delcroix, Shota Horiguchi, Atsushi Ando, Shoko Araki, Herv\'e Bredin

PDF

Open Access

TL;DR

This paper thoroughly analyzes how different architecture choices affect the performance of end-to-end neural speaker diarization with vector clustering, identifying optimal configurations and achieving state-of-the-art results.

Contribution

It provides a comprehensive evaluation of encoder, decoder, loss, and chunk size choices, revealing their impacts and best practices for diarization pipelines.

Findings

01

Finetuned WavLM encoders outperform others.

02

Conformer decoders yield the best performance.

03

Multiclass loss generally improves accuracy.

Abstract

End-to-End Neural Diarization with Vector Clustering is a powerful and practical approach to perform Speaker Diarization. Multiple enhancements have been proposed for the segmentation model of these pipelines, but their synergy had not been thoroughly evaluated. In this work, we provide an in-depth analysis on the impact of major architecture choices on the performance of the pipeline. We investigate different encoders (SincNet, pretrained and finetuned WavLM), different decoders (LSTM, Mamba, and Conformer), different losses (multilabel and multiclass powerset), and different chunk sizes. Through in-depth experiments covering nine datasets, we found that the finetuned WavLM-based encoder always results in the best systems by a wide margin. The LSTM decoder is outclassed by Mamba- and Conformer-based decoders, and while we found Mamba more robust to other architecture choices, it is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Technology and Control Systems