When Individually Calibrated Models Become Collectively Miscalibrated
Zhaohui Wang

TL;DR
This paper reveals that individually calibrated probabilistic models can become collectively miscalibrated when their predictions interact strategically, especially under certain aggregation methods and correlations.
Contribution
It demonstrates the failure of the common calibration assumption in multi-agent settings and proposes VCG-based aggregation as a robust alternative.
Findings
Individually calibrated models can become miscalibrated when aggregated.
Positive correlation among agents leads to systematic underestimation of probabilities.
VCG-based aggregation aligns incentives and maintains robustness in real-world datasets.
Abstract
Probabilistic prediction systems often aggregate probability estimates from multiple models into a single decision. A common assumption is that if each model is individually calibrated, the aggregate prediction will also be well calibrated. We show that this assumption fails in multi-agent settings: individually calibrated predictors can become collectively miscalibrated when their predictions interact strategically, in the game-theoretic sense of Brier-optimal local response, even without deliberate coordination. This phenomenon arises naturally when agents are independently trained on overlapping data. We prove that under Brier-score-based aggregation with positively correlated beliefs, each agent's individually optimal report systematically underestimates the positive-class probability, yielding a Price of Anarchy greater than one whenever Cov(b_i, b_j) > 0. In a canonical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
