Probabilistic Fusion and Calibration of Neural Speaker Diarization Models
Juan Ignacio Alvarez-Trejos, Sergio A. Balanya, Daniel Ramos, Alicia Lozano-Diez

TL;DR
This paper introduces a comprehensive framework for calibrating and fusing neural speaker diarization models at the probability level, significantly improving accuracy and confidence reliability over existing methods.
Contribution
It presents the first detailed approach for probabilistic fusion and calibration of EEND models, demonstrating substantial DER improvements and better confidence estimates.
Findings
Proper calibration reduces DER by up to 19%.
Joint calibration in powerset space outperforms independent calibration.
Fusion with calibrated models surpasses DOVER-Lap in DER.
Abstract
End-to-End Neural Diarization (EEND) systems produce frame-level probabilistic speaker activity estimates, yet since evaluation focuses primarily on Diarization Error Rate (DER), the reliability and calibration of these confidence scores have been largely neglected. When fusing multiple diarization systems, DOVER-Lap remains the only established approach, operating at the segment level with hard decisions. We propose working with continuous probability outputs, which enables more sophisticated fusion and calibration techniques that can leverage model uncertainty and complementary strengths across different architectures. This paper presents the first comprehensive framework for calibrating and fusing EEND models at the probability level. We investigate two output formulations (multilabel and powerset representations) and their impact on calibration and fusion effectiveness. Through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Topic Modeling
