TL;DR
This paper investigates the calibration of powerset neural speaker diarization models, demonstrating that confidence scores can predict errors and improve training and validation efficiency, advancing diarization accuracy.
Contribution
It introduces a calibration analysis for powerset diarization models and shows how confidence scores can be used to enhance training and evaluation processes.
Findings
Top-label confidence predicts high-error regions reliably.
Training on low-confidence regions improves model calibration.
Validation on low-confidence regions enhances annotation efficiency.
Abstract
End-to-end neural diarization models have usually relied on a multilabel-classification formulation of the speaker diarization problem. Recently, we proposed a powerset multiclass formulation that has beaten the state-of-the-art on multiple datasets. In this paper, we propose to study the calibration of a powerset speaker diarization model, and explore some of its uses. We study the calibration in-domain, as well as out-of-domain, and explore the data in low-confidence regions. The reliability of model confidence is then tested in practice: we use the confidence of the pretrained model to selectively create training and validation subsets out of unannotated data, and compare this to random selection. We find that top-label confidence can be used to reliably predict high-error regions. Moreover, training on low-confidence regions provides a better calibrated model, and validating on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
