Investigating Confidence Estimation Measures for Speaker Diarization

Anurag Chowdhury; Abhinav Misra; Mark C. Fuhs; Monika Woszczyna

arXiv:2406.17124·cs.SD·June 26, 2024

Investigating Confidence Estimation Measures for Speaker Diarization

Anurag Chowdhury, Abhinav Misra, Mark C. Fuhs, Monika Woszczyna

PDF

Open Access

TL;DR

This paper evaluates various methods for generating confidence scores in speaker diarization to identify and reduce errors, thereby improving downstream speech processing tasks.

Contribution

It systematically compares multiple confidence estimation methods across datasets, highlighting the most effective approaches for error detection.

Findings

01

Confidence scores can isolate ~30% of errors within the lowest ~10% confidence segments.

02

Different confidence estimation methods vary in effectiveness across datasets.

03

The study provides insights into improving diarization reliability for downstream applications.

Abstract

Speaker diarization systems segment a conversation recording based on the speakers' identity. Such systems can misclassify the speaker of a portion of audio due to a variety of factors, such as speech pattern variation, background noise, and overlapping speech. These errors propagate to, and can adversely affect, downstream systems that rely on the speaker's identity, such as speaker-adapted speech recognition. One of the ways to mitigate these errors is to provide segment-level diarization confidence scores to downstream systems. In this work, we investigate multiple methods for generating diarization confidence scores, including those derived from the original diarization system and those derived from an external model. Our experiments across multiple datasets and diarization systems demonstrate that the most competitive confidence score methods can isolate ~30% of the diarization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing