Mining Electronic Health Records to Investigate Effectiveness of Ensemble Deep Clustering
Manar D. Samad, Yina Hou, Shrabani Ghosh

TL;DR
This study evaluates various clustering methods on EHR data for heart failure patients, introducing an ensemble deep clustering approach that improves performance by combining multiple embeddings and traditional methods.
Contribution
It proposes an ensemble-based deep clustering method that aggregates multiple embeddings, outperforming individual clustering techniques on real EHR data.
Findings
Traditional methods perform robustly on EHR data.
Deep clustering benefits from ensemble aggregation of multiple embeddings.
Combining traditional and deep clustering improves overall performance.
Abstract
In electronic health records (EHRs), clustering patients and distinguishing disease subtypes are key tasks to elucidate pathophysiology and aid clinical decision-making. However, clustering in healthcare informatics is still based on traditional methods, especially K-means, and has achieved limited success when applied to embedding representations learned by autoencoders as hybrid methods. This paper investigates the effectiveness of traditional, hybrid, and deep learning methods in heart failure patient cohorts using real EHR data from the All of Us Research Program. Traditional clustering methods perform robustly because deep learning approaches are specifically designed for image clustering, a task that differs substantially from the tabular EHR data setting. To address the shortcomings of deep clustering, we introduce an ensemble-based deep clustering approach that aggregates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
