Learning Disease Progression Models That Capture Health Disparities
Erica Chiang, Divya Shanmugam, Ashley N. Beecy, Gabriel Sayer, Deborah, Estrin, Nikhil Garg, Emma Pierson

TL;DR
This paper introduces an interpretable Bayesian disease progression model that explicitly captures health disparities, addressing biases in severity estimation and improving risk stratification for disadvantaged patient groups.
Contribution
The paper develops a novel Bayesian model that incorporates health disparities into disease progression, enhancing interpretability and bias correction in severity estimation.
Findings
Model identifies health disparities in patient groups
Accounting for disparities shifts high-risk patient identification
Improves accuracy of disease severity estimates
Abstract
Disease progression models are widely used to inform the diagnosis and treatment of many progressive diseases. However, a significant limitation of existing models is that they do not account for health disparities that can bias the observed data. To address this, we develop an interpretable Bayesian disease progression model that captures three key health disparities: certain patient populations may (1) start receiving care only when their disease is more severe, (2) experience faster disease progression even while receiving care, or (3) receive follow-up care less frequently conditional on disease severity. We show theoretically and empirically that failing to account for any of these disparities can result in biased estimates of severity (e.g., underestimating severity for disadvantaged groups). On a dataset of heart failure patients, we show that our model can identify groups that…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
This paper studies the important topic of predicting disease progression by capturing the disparities between patients.
1. The proposed method appears to be a variant of a hidden Markov model (HMM). Instead of using transition probabilities in HMM, it employs simple functions to describe transitions between states and outcomes. This simplification might limit the model's ability to capture the complex dynamics of disease progression. 2. The directed acyclic graph, the selection of functions between observed and hidden variables, and the specific types of disparities incorporated in Section 3 seem overly simplisti
The paper has several interesting ideas - the addressed problem is very important - the model is well explained, and the theoretical analysis seems strong - the authors provide an extensive empirical analysis, with interesting insights
However, the paper suffers from several flaws. I am really willing to increase my grade if those points are addressed, but as it is, the contribution of the model compared to existing strategies, in terms of performance is really unclear. - the baselines seem quite weak. The authors report that several indicators are important, like the visit frequency. I am not sure to understand from the manuscript which features the baselines include, in particular do they include demographics information? an
- Alleviating health disparities and fairness w.r.t. clinical algorithms is an important aspect of machine learning for health and the impact of disparities on disease progression modeling is important - Interpretability and estimating effects of disparities and covariates on disease progression is clinically meaningful
- While the model is interesting from a clinical perspective, I am not sure this is the right venue for this publication due to limited technical novelty - Very limiting disease progression trajectory due to linear assumption in time. A patient can therefore not experience worsening and improvement on their disease trajectory - The proofs show that not taking into account disparities will bias the result, however other (non-linear) disease progression models can take "baseline" covariates like a
The paper is well written, easy to read and tackles an important problem: improving modelling when disparities mark model. The paper presents theoretical justification and thoroughly evaluates the proposed methodology on both synthetic and real-world data.
The model makes assumptions upon the expression of disparities while being identifiable. The underlying process must verify the assumptions. It would be beneficial to discuss further how realistic and/or common these assumptions are. An analysis of a misspecified model when the underlying generating process does not meet these assumptions would be valuable.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChronic Disease Management Strategies · Machine Learning in Healthcare
