Bayesian Profile Regression using Variational Inference to Identify Clusters of Multiple Long-Term Conditions Conditioning on Mortality in Population-Scale Data
James Rafferty, Keith R Abrams, Munir Pirmohamed, Mark Davies, Rhiannon K Owen

TL;DR
This study develops a scalable Bayesian clustering method using Variational Inference to identify disease clusters in large-scale health data, revealing key clusters associated with mortality in a population of over 1.2 million individuals.
Contribution
Introduces a full-rank Stochastic Variational Inference approach for Bayesian Profile Regression, enabling analysis of population-scale datasets for disease clustering.
Findings
Identified 33 disease clusters in large EHR data.
Clusters with metastatic cancer and cardiovascular diseases linked to higher mortality.
SVI performed comparably to NUTS in simulation, enabling large-scale analysis.
Abstract
Multiple long-term conditions (MLTC) are increasingly observed in clinical practice globally. Clustering methods to group diseases into commonly co-occurring clusters have been of interest for further understanding of how MLTC group together and their associated impact on patient outcomes. However, such approaches require large, often population-scale datasets. Bayesian Profile Regression (BPR) is a statistical model that combines a Dirichlet Process Mixture model with a hierarchical regression model, in order to form clusters of items conditional on covariates and an outcome of interest. We developed a BPR model using full-rank Stochastic Variational Inference (SVI) for application in large-scale data. We assessed it's performance using simulation studies comparing fits using the No-U-turn (NUTS) sampler and full-rank SVI. We then fit a BPR model to find clusters of MLTC in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChronic Disease Management Strategies · Machine Learning in Healthcare · Bayesian Methods and Mixture Models
