Enhancing Robustness of Foundation Model Representations under Provenance-related Distribution Shifts
Xiruo Ding, Zhecheng Sheng, Brian Hur, Feng Chen, Serguei V. S., Pakhomov, Trevor Cohen

TL;DR
This study investigates how foundation model representations perform under provenance-related distribution shifts in clinical data, showing that simple adjustments can significantly improve robustness against confounding effects.
Contribution
The paper introduces a synthetic sampling strategy to evaluate foundation model robustness and demonstrates that a straightforward confounding adjustment enhances prediction stability.
Findings
Foundation models exhibit some inherent robustness to provenance-related shifts.
Simple confounding adjustments can significantly improve model robustness.
Representation stability varies with the degree of distribution shift.
Abstract
Foundation models are a current focus of attention in both industry and academia. While they have shown their capabilities in a variety of tasks, in-depth research is required to determine their robustness to distribution shift when used as a basis for supervised machine learning. This is especially important in the context of clinical data, with particular limitations related to data accessibility, lack of pretraining materials, and limited availability of high-quality annotations. In this work, we examine the stability of models based on representations from foundation models under distribution shift. We focus on confounding by provenance, a form of distribution shift that emerges in the context of multi-institutional datasets when there are differences in source-specific language use and class distributions. Using a sampling strategy that synthetically induces varying degrees of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Explainable Artificial Intelligence (XAI) · Topic Modeling
MethodsFocus
