How to evaluate the calibration of a disease risk prediction tool
V. Viallon, J. Benichou, F. Clavel-Chapelon, S. Ragusa

TL;DR
This paper compares four methods for evaluating the calibration of disease risk prediction tools, highlighting biases in common approaches and demonstrating their impact through simulations and a real breast cancer model case study.
Contribution
It introduces and compares four methods for calibration assessment, revealing biases in standard techniques and providing guidance for more accurate evaluation.
Findings
Two common methods are biased due to censoring.
Simulation shows the extent of bias in traditional methods.
Application to breast cancer model illustrates practical implications.
Abstract
To evaluate the calibration of a disease risk prediction tool, the quantity , i.e., the ratio of the expected number of events to the observed number of events, is generally computed. However, because of censoring, or more precisely because of individuals who drop out before the termination of the study, this quantity is generally unavailable for the complete population study and an alternative estimate has to be computed. In this paper, we present and compare four methods to do this. We show that two of the most commonly used methods generally lead to biased estimates. Our arguments are first based on some theoretic considerations. Then, we perform a simulation study to highlight the magnitude of the previously mentioned biases. As a concluding example, we evaluate the calibration of an existing predictive model for breast cancer on the E3N-EPIC cohort.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare
