Enhancing Robustness of Foundation Model Representations under   Provenance-related Distribution Shifts

Xiruo Ding; Zhecheng Sheng; Brian Hur; Feng Chen; Serguei V. S.; Pakhomov; Trevor Cohen

arXiv:2312.05435·cs.CL·December 12, 2023·1 cites

Enhancing Robustness of Foundation Model Representations under Provenance-related Distribution Shifts

Xiruo Ding, Zhecheng Sheng, Brian Hur, Feng Chen, Serguei V. S., Pakhomov, Trevor Cohen

PDF

Open Access

TL;DR

This study investigates how foundation model representations perform under provenance-related distribution shifts in clinical data, showing that simple adjustments can significantly improve robustness against confounding effects.

Contribution

The paper introduces a synthetic sampling strategy to evaluate foundation model robustness and demonstrates that a straightforward confounding adjustment enhances prediction stability.

Findings

01

Foundation models exhibit some inherent robustness to provenance-related shifts.

02

Simple confounding adjustments can significantly improve model robustness.

03

Representation stability varies with the degree of distribution shift.

Abstract

Foundation models are a current focus of attention in both industry and academia. While they have shown their capabilities in a variety of tasks, in-depth research is required to determine their robustness to distribution shift when used as a basis for supervised machine learning. This is especially important in the context of clinical data, with particular limitations related to data accessibility, lack of pretraining materials, and limited availability of high-quality annotations. In this work, we examine the stability of models based on representations from foundation models under distribution shift. We focus on confounding by provenance, a form of distribution shift that emerges in the context of multi-institutional datasets when there are differences in source-specific language use and class distributions. Using a sampling strategy that synthetically induces varying degrees of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Explainable Artificial Intelligence (XAI) · Topic Modeling

MethodsFocus