Pan-disease clustering analysis of the trend of period prevalence
Sneha Jadhav, Chenjin Ma, Yefei Jiang, Ben-Chang Shia, Shuangge Ma

TL;DR
This study introduces a novel clustering method to analyze the joint prevalence trends of multiple diseases over time, revealing meaningful disease groupings from Taiwan's extensive health data.
Contribution
It develops a new penalization pursuit approach for pan-disease clustering of prevalence trends, applied to large-scale national health data.
Findings
Identified 35 disease clusters with similar prevalence trends
Discovered significant differences from alternative clustering methods
Provided interpretable disease groupings with sound clinical relevance
Abstract
For all diseases, prevalence has been carefully studied. In the "classic" paradigm, the prevalence of different diseases has usually been studied separately. Accumulating evidences have shown that diseases can be "correlated". The joint analysis of prevalence of multiple diseases can provide important insights beyond individual-disease analysis, however, has not been well conducted. In this study, we take advantage of the uniquely valuable Taiwan National Health Insurance Research Database (NHIRD), and conduct a pan-disease analysis of period prevalence trend. The goal is to identify clusters within which diseases share similar period prevalence trends. For this purpose, a novel penalization pursuit approach is developed, which has an intuitive formulation and satisfactory properties. In data analysis, the period prevalence values are computed using records on close to 1 million…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData-Driven Disease Surveillance · Bayesian Methods and Mixture Models · Genetic Associations and Epidemiology
