The C-index Multiverse
Bego\~na B. Sierra, Colin McLean, Peter S. Hall, Catalina A. Vallejos

TL;DR
This paper investigates the variability in C-index calculations across different software implementations, highlighting how this multiverse affects reproducibility and model comparison in survival analysis.
Contribution
It identifies sources of variation in C-index estimation, demonstrates their impact on survival model evaluation, and provides guidelines and tools for more transparent and consistent reporting.
Findings
Different software yield varying C-index results due to tie handling and censoring adjustments.
The C-index multiverse affects model performance comparisons across methods.
Guidelines and code are provided to improve reproducibility and transparency.
Abstract
Quantifying out-of-sample discrimination performance for time-to-event outcomes is a fundamental step for model evaluation and selection in the context of predictive modelling. The concordance index, or C-index, is a widely used metric for this purpose, particularly with the growing development of machine learning methods. Beyond differences between proposed C-index estimators (e.g. Harrell's, Uno's and Antolini's), we demonstrate the existence of a C-index multiverse among available R and python software, where seemingly equal implementations can yield different results. This can undermine reproducibility and complicate fair comparisons across models and studies. Key variation sources include tie handling and adjustment to censoring. Additionally, the absence of a standardised approach to summarise risk from survival distributions, result in another source of variation dependent on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
