clustra: A multi-platform k-means clustering algorithm for analysis of longitudinal trajectories in large electronic health records data
Nimish Adhikari, Hanna Gerlovin, George Ostrouchov, Rachel Ehrbar, Alyssa B. Dufour, Brian R. Ferolito, Serkalem Demissie, Lauren Costa, Yuk-Lam Ho, Laura Tarko, Edmon Begoli, Kelly Cho, David R. Gagnon

TL;DR
This paper introduces clustra, a multi-platform k-means clustering algorithm designed to analyze longitudinal trajectories in large electronic health records, enabling better understanding of patient data over time.
Contribution
The paper presents a novel multi-platform implementation of a k-means clustering algorithm with splines for longitudinal data, compatible with R and SAS, and includes diagnostic tools.
Findings
Comparable results achieved with R and SAS implementations
Effective clustering of blood pressure trajectories in EHR data
Supports large-scale longitudinal data analysis
Abstract
Background and Objective: Variables collected over time, or longitudinally, such as biologic measurements in electronic health records data, are not simple to summarize with a single time-point, and thus can be more holistically conceptualized as trajectories over time. Cluster analysis with longitudinal data further allows for clinical representation of groups of subjects with similar trajectories and identification of unique characteristics, or phenotypes, that can be investigated as risk factors or disease outcomes. Some of the challenges in estimating these clustered trajectories lie in the handling of observations at inconsistent time intervals and the usability of algorithms across programming languages. Methods: We propose longitudinal trajectory clustering using a k-means algorithm with thin-plate regression splines, implemented across multiple platforms, the R package clustra…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Bayesian Methods and Mixture Models · Data Analysis with R
