VBx for End-to-End Neural and Clustering-based Diarization
Petr P\'alka, Jiangyu Han, Marc Delcroix, Naohiro Tawara, Luk\'a\v{s} Burget

TL;DR
This paper enhances speaker diarization by improving the clustering stage in a two-stage neural framework, integrating VBx clustering for better robustness across diverse domains without dataset-specific tuning.
Contribution
It introduces a filtering and reassignment method for embeddings and incorporates VBx clustering into the EEND-VC framework, improving generalization and performance.
Findings
Achieves state-of-the-art diarization accuracy across multiple domains.
Demonstrates robustness with limited speaker durations and large speaker counts.
No fine-tuning needed for different datasets.
Abstract
We present improvements to speaker diarization in the two-stage end-to-end neural diarization with vector clustering (EEND-VC) framework. The first stage employs a Conformer-based EEND model with WavLM features to infer frame-level speaker activity within short windows. The identities and counts of global speakers are then derived in the second stage by clustering speaker embeddings across windows. The focus of this work is to improve the second stage; we filter unreliable embeddings from short segments and reassign them after clustering. We also integrate the VBx clustering to improve robustness when the number of speakers is large and individual speaking durations are limited. Evaluation on a compound benchmark spanning multiple domains is conducted without fine-tuning the EEND model or tuning clustering parameters per dataset. Despite this, the system generalizes well and matches or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Speech and dialogue systems
