Discriminative Training of VBx Diarization
Dominik Klement, Mireia Diez, Federico Landini, Luk\'a\v{s} Burget,, Anna Silnova, Marc Delcroix, Naohiro Tawara

TL;DR
This paper introduces a discriminative training framework for VBx diarization, optimizing parameters directly for diarization error rate, and demonstrates improved performance across multiple datasets.
Contribution
It proposes a novel discriminative training method for VBx, including a new loss function that correlates better with diarization error, and shows that fine-tuning improves results.
Findings
Discriminative training achieves performance comparable to extensive grid search.
New loss function correlates better with diarization error rate.
Fine-tuning PLDA improves diarization accuracy.
Abstract
Bayesian HMM clustering of x-vector sequences (VBx) has become a widely adopted diarization baseline model in publications and challenges. It uses an HMM to model speaker turns, a generatively trained probabilistic linear discriminant analysis (PLDA) for speaker distribution modeling, and Bayesian inference to estimate the assignment of x-vectors to speakers. This paper presents a new framework for updating the VBx parameters using discriminative training, which directly optimizes a predefined loss. We also propose a new loss that better correlates with the diarization error rate compared to binary cross-entropy the default choice for diarization end-to-end systems. Proof-of-concept results across three datasets (AMI, CALLHOME, and DIHARD II) demonstrate the method's capability of automatically finding hyperparameters, achieving comparable performance to those found by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing
MethodsDiscriminative Fine-Tuning
