Probabilistic Fusion and Calibration of Neural Speaker Diarization Models

Juan Ignacio Alvarez-Trejos; Sergio A. Balanya; Daniel Ramos; Alicia Lozano-Diez

arXiv:2511.22696·cs.SD·December 4, 2025

Probabilistic Fusion and Calibration of Neural Speaker Diarization Models

Juan Ignacio Alvarez-Trejos, Sergio A. Balanya, Daniel Ramos, Alicia Lozano-Diez

PDF

Open Access

TL;DR

This paper introduces a comprehensive framework for calibrating and fusing neural speaker diarization models at the probability level, significantly improving accuracy and confidence reliability over existing methods.

Contribution

It presents the first detailed approach for probabilistic fusion and calibration of EEND models, demonstrating substantial DER improvements and better confidence estimates.

Findings

01

Proper calibration reduces DER by up to 19%.

02

Joint calibration in powerset space outperforms independent calibration.

03

Fusion with calibrated models surpasses DOVER-Lap in DER.

Abstract

End-to-End Neural Diarization (EEND) systems produce frame-level probabilistic speaker activity estimates, yet since evaluation focuses primarily on Diarization Error Rate (DER), the reliability and calibration of these confidence scores have been largely neglected. When fusing multiple diarization systems, DOVER-Lap remains the only established approach, operating at the segment level with hard decisions. We propose working with continuous probability outputs, which enables more sophisticated fusion and calibration techniques that can leverage model uncertainty and complementary strengths across different architectures. This paper presents the first comprehensive framework for calibrating and fusing EEND models at the probability level. We investigate two output formulations (multilabel and powerset representations) and their impact on calibration and fusion effectiveness. Through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Topic Modeling