Correcting heterogeneous diagnostic bias when developing clinical prediction models using causal hidden Markov models

Jose Benitez-Aurioles; Ricardo Silva; Brian McMillan; Matthew Sperrin

arXiv:2605.06059·stat.AP·May 8, 2026

Correcting heterogeneous diagnostic bias when developing clinical prediction models using causal hidden Markov models

Jose Benitez-Aurioles, Ricardo Silva, Brian McMillan, Matthew Sperrin

PDF

TL;DR

This paper introduces a causal hidden Markov model approach to correct for diagnostic bias in clinical prediction models caused by heterogeneous testing rates across populations.

Contribution

It proposes a novel method combining causal inference and hidden Markov models to adjust for differential diagnostic delays in prediction models.

Findings

01

Reduces prediction bias and improves calibration in simulated data.

02

Corrects the Observed:Expected ratio from 1.34 to 1.02 in simulations.

03

Improves the Observed:Expected ratio from 1.55 to 1.01 in a clinical case study.

Abstract

In routine care, individuals identified a priori as high-risk are usually tested for conditions more frequently. Protected attributes, such as sex or ethnicity may also determine testing frequency. Such heterogeneous detection rates across a population induce label error. This causes systematic model error for specific groups and biases performance metrics during validation. This paper proposes a method to correct for such bias in prediction models due to differential diagnostic delay. We use a causal inference framework to define our target estimand: an individual's diagnosis probability in a counterfactual scenario where their diagnosis rate matches that of a reference group. We model the longitudinal process as a hidden Markov model, in which confirmatory test results are emissions from a latent progressive disease stage. We validate our approach in simulated data and apply it to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.