TL;DR
This paper introduces a novel application of vine copulas to analyze complex multivariate dependencies in electronic health records, enabling better data exploration and variable selection for healthcare outcomes.
Contribution
It repurposes vine copulas for probabilistic mining of EHR data, providing data-driven explanations, visualization, and variable selection methods.
Findings
Identified conditional dependencies between co-morbid conditions.
Validated the approach on different patient cohorts.
Produced interpretable tree structures representing variable dependencies.
Abstract
Electronic health records (EHR) store hundreds of demographic and laboratory variables from large patient populations. Traditional statistical methods have limited capacity in processing mixed-type data (continuous, ordinal) and capturing non-linear relationships in large multivariate data when oversimplified assumptions are made about the distribution (e.g., Gaussian) of disparate variables in EHR data. This paper addresses the limitations mentioned above by repurposing the vine copula method, which is primarily used to synthesize a multivariate distribution from many bivariate cumulative distribution functions (copulas). Vine copulas produce tree structures that represent bivariate conditional dependencies at varying hierarchical levels, decomposing a multivariate distribution. The tree structure is used to rank variables by conditional dependence and to identify a subset of central…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
