# A-calibration: assessment of prediction models for survival data under censoring

**Authors:** Mikkel Runason Simonsen, Rasmus Plenge Waagepetersen

PMC · DOI: 10.1186/s12874-025-02671-6 · BMC Medical Research Methodology · 2025-10-22

## TL;DR

This paper introduces A-calibration, a new method to evaluate how well survival prediction models are calibrated, especially when data is censored.

## Contribution

A-calibration is proposed as a more powerful and less conservative alternative to D-calibration for assessing model calibration in survival analysis.

## Key findings

- A-calibration showed similar or better power than D-calibration in detecting miscalibration across different censoring scenarios.
- D-calibration was found to be more sensitive to censoring, leading to potential loss of power.
- Theoretical and empirical evidence supports A-calibration as a superior calibration assessment method.

## Abstract

Evaluating the performance of predictive models for survival is essential before they can be trusted for real-world applications and decision making. While good measures such as the C-index are available for model discrimination, the toolbox for model calibration is much more limited in the time-to-event setting.

The method of D-calibration was therefore an important contribution that yields a single numeric value for calibration across the available follow-up time. D-calibration consists of performing a Pearson’s goodness-of-fit test on transformed survival times. Censored survival times are handled using an imputation approach which however tends to yield a conservative test and loss of power.

In this paper, we introduce A-calibration based on Akritas’s goodness-of-fit test which is designed specifically for censored time-to-event data. Through theoretical arguments, simulations, and a case study, we compare A- and D-calibration as measures of calibration. In the simulation study, the power of each test to reject a false null-hypothesis was assessed for varying censoring mechanisms (memoryless, uniform and zero censoring), censoring rates, and parameter values of the predictive model considered.

The simulation study demonstrated that A-calibration had similar or superior power to D-calibration in all considered cases, and that D-calibration, unlike A-calibration, was particularly sensitive to censoring.

Advantages of A-calibration compared to D-calibration have been demonstrated through theoretical considerations, a simulation study, and a case study, while no disadvantages relative to D-calibration were identified.

The online version contains supplementary material available at 10.1186/s12874-025-02671-6.

## Full-text entities

- **Diseases:** died (MESH:D003643), RSF (MESH:D011475), cancer (MESH:D009369), breast cancer (MESH:D001943), IBS (MESH:D000081042)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12542389/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12542389/full.md

## References

4 references — full list in the complete paper: https://tomesphere.com/paper/PMC12542389/full.md

---
Source: https://tomesphere.com/paper/PMC12542389