Investigating Confidence Estimation Measures for Speaker Diarization
Anurag Chowdhury, Abhinav Misra, Mark C. Fuhs, Monika Woszczyna

TL;DR
This paper evaluates various methods for generating confidence scores in speaker diarization to identify and reduce errors, thereby improving downstream speech processing tasks.
Contribution
It systematically compares multiple confidence estimation methods across datasets, highlighting the most effective approaches for error detection.
Findings
Confidence scores can isolate ~30% of errors within the lowest ~10% confidence segments.
Different confidence estimation methods vary in effectiveness across datasets.
The study provides insights into improving diarization reliability for downstream applications.
Abstract
Speaker diarization systems segment a conversation recording based on the speakers' identity. Such systems can misclassify the speaker of a portion of audio due to a variety of factors, such as speech pattern variation, background noise, and overlapping speech. These errors propagate to, and can adversely affect, downstream systems that rely on the speaker's identity, such as speaker-adapted speech recognition. One of the ways to mitigate these errors is to provide segment-level diarization confidence scores to downstream systems. In this work, we investigate multiple methods for generating diarization confidence scores, including those derived from the original diarization system and those derived from an external model. Our experiments across multiple datasets and diarization systems demonstrate that the most competitive confidence score methods can isolate ~30% of the diarization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
