TL;DR
MURAL is an unsupervised random forest method designed to embed and visualize heterogeneous EHR data, including missing not at random variables, improving classification and cohort comparison.
Contribution
This paper introduces MURAL, a novel unsupervised random forest approach that effectively handles mixed variable types and missing not at random data in EHRs for embedding and visualization.
Findings
MURAL outperforms competing methods in visualization accuracy.
MURAL enables better classification of clinical data.
Tree-sliced Wasserstein distances facilitate cohort comparisons.
Abstract
A major challenge in embedding or visualizing clinical patient data is the heterogeneity of variable types including continuous lab values, categorical diagnostic codes, as well as missing or incomplete data. In particular, in EHR data, some variables are {\em missing not at random (MNAR)} but deliberately not collected and thus are a source of information. For example, lab tests may be deemed necessary for some patients on the basis of suspected diagnosis, but not for others. Here we present the MURAL forest -- an unsupervised random forest for representing data with disparate variable types (e.g., categorical, continuous, MNAR). MURAL forests consist of a set of decision trees where node-splitting variables are chosen at random, such that the marginal entropy of all other variables is minimized by the split. This allows us to also split on MNAR variables and discrete variables in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
