When Individually Calibrated Models Become Collectively Miscalibrated

Zhaohui Wang

arXiv:2605.18858·cs.LG·May 20, 2026

When Individually Calibrated Models Become Collectively Miscalibrated

Zhaohui Wang

PDF

TL;DR

This paper reveals that individually calibrated probabilistic models can become collectively miscalibrated when their predictions interact strategically, especially under certain aggregation methods and correlations.

Contribution

It demonstrates the failure of the common calibration assumption in multi-agent settings and proposes VCG-based aggregation as a robust alternative.

Findings

01

Individually calibrated models can become miscalibrated when aggregated.

02

Positive correlation among agents leads to systematic underestimation of probabilities.

03

VCG-based aggregation aligns incentives and maintains robustness in real-world datasets.

Abstract

Probabilistic prediction systems often aggregate probability estimates from multiple models into a single decision. A common assumption is that if each model is individually calibrated, the aggregate prediction will also be well calibrated. We show that this assumption fails in multi-agent settings: individually calibrated predictors can become collectively miscalibrated when their predictions interact strategically, in the game-theoretic sense of Brier-optimal local response, even without deliberate coordination. This phenomenon arises naturally when agents are independently trained on overlapping data. We prove that under Brier-score-based aggregation with positively correlated beliefs, each agent's individually optimal report systematically underestimates the positive-class probability, yielding a Price of Anarchy greater than one whenever Cov(b_i, b_j) > 0. In a canonical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.