Learning and DiSentangling Patient Static Information from Time-series Electronic HEalth Record (STEER)
Wei Liao, Joel Voldman

TL;DR
This paper investigates how time-series electronic health record data can predict static patient information and introduces a variational autoencoder method to disentangle sensitive attributes, addressing privacy and fairness concerns.
Contribution
It systematically assesses the predictability of static patient attributes from EHR data and proposes a novel VAE-based approach to protect sensitive information.
Findings
High predictive accuracy for static attributes like sex, age, and race from EHR data.
Predictability persists across different models, cohorts, and tasks.
Proposed method effectively disentangles sensitive attributes from time-series data.
Abstract
Recent work in machine learning for healthcare has raised concerns about patient privacy and algorithmic fairness. For example, previous work has shown that patient self-reported race can be predicted from medical data that does not explicitly contain racial information. However, the extent of data identification is unknown, and we lack ways to develop models whose outcomes are minimally affected by such information. Here we systematically investigated the ability of time-series electronic health record data to predict patient static information. We found that not only the raw time-series data, but also learned representations from machine learning models, can be trained to predict a variety of static information with area under the receiver operating characteristic curve as high as 0.851 for biological sex, 0.869 for binarized age and 0.810 for self-reported race. Such high predictive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Insurance, Mortality, Demography, Risk Management · Global Cancer Incidence and Screening
