WISTERIA: Learning Clinical Representations from Noisy Supervision via Multi-View Consistency in Electronic Health Records
Ruan Dong, Yuanyun Zhang, Shi Li

TL;DR
WISTERIA is a weakly supervised framework for learning robust clinical representations from noisy EHR data by enforcing consistency across multiple supervision signals and incorporating semantic regularization.
Contribution
It introduces a multi-view consistency approach that models labels as stochastic observations, improving robustness and generalization in clinical representation learning.
Findings
Improves predictive performance on EHR benchmarks.
Demonstrates robustness to label noise.
Outperforms sequence-based pretraining in cross-institutional settings.
Abstract
Representation learning in electronic health records (EHR) has largely followed paradigms inherited from natural language processing, relying on sequence modeling and reconstruction based objectives that treat clinical labels as ground truth. However, real world clinical supervision is inherently weak, arising from heterogeneous, noisy, and institution specific labeling processes such as billing codes, heuristic phenotypes, and incomplete annotations. In this work, we propose WISTERIA, a weakly supervised representation learning framework that models labels as stochastic observations of an underlying latent clinical state. Instead of optimizing against a single supervision signal, WISTERIA constructs multiple weak supervision operators and learns representations by enforcing consistency across their induced label distributions. This multi view formulation induces an implicit denoising…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
