MURAL: An Unsupervised Random Forest-Based Embedding for Electronic   Health Record Data

Michal Gerasimiuk; Dennis Shung; Alexander Tong; Adrian Stanley,; Michael Schultz; Jeffrey Ngu; Loren Laine; Guy Wolf; Smita Krishnaswamy

arXiv:2111.10452·cs.LG·November 23, 2021

MURAL: An Unsupervised Random Forest-Based Embedding for Electronic Health Record Data

Michal Gerasimiuk, Dennis Shung, Alexander Tong, Adrian Stanley,, Michael Schultz, Jeffrey Ngu, Loren Laine, Guy Wolf, Smita Krishnaswamy

PDF

1 Repo

TL;DR

MURAL is an unsupervised random forest method designed to embed and visualize heterogeneous EHR data, including missing not at random variables, improving classification and cohort comparison.

Contribution

This paper introduces MURAL, a novel unsupervised random forest approach that effectively handles mixed variable types and missing not at random data in EHRs for embedding and visualization.

Findings

01

MURAL outperforms competing methods in visualization accuracy.

02

MURAL enables better classification of clinical data.

03

Tree-sliced Wasserstein distances facilitate cohort comparisons.

Abstract

A major challenge in embedding or visualizing clinical patient data is the heterogeneity of variable types including continuous lab values, categorical diagnostic codes, as well as missing or incomplete data. In particular, in EHR data, some variables are {\em missing not at random (MNAR)} but deliberately not collected and thus are a source of information. For example, lab tests may be deemed necessary for some patients on the basis of suspected diagnosis, but not for others. Here we present the MURAL forest -- an unsupervised random forest for representing data with disparate variable types (e.g., categorical, continuous, MNAR). MURAL forests consist of a set of decision trees where node-splitting variables are chosen at random, such that the marginal entropy of all other variables is minimized by the split. This allows us to also split on MNAR variables and discrete variables in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mgerasimiuk/mural
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.