Veridical Data Science for Medical Foundation Models
Ahmed Alaa, Bin Yu

TL;DR
This paper critically examines how the rise of foundation models in medicine alters traditional data science workflows, highlighting challenges to veridical data science principles and proposing refined guidelines for responsible use.
Contribution
It analyzes the impact of foundation models on medical data science workflows and offers recommendations to align them with core principles of veridical data science.
Findings
Foundation models shift the data science workflow in medicine.
Current workflows challenge principles of predictability, computability, and stability.
Proposed guidelines aim to improve transparency and reproducibility.
Abstract
The advent of foundation models (FMs) such as large language models (LLMs) has led to a cultural shift in data science, both in medicine and beyond. This shift involves moving away from specialized predictive models trained for specific, well-defined domain questions to generalist FMs pre-trained on vast amounts of unstructured data, which can then be adapted to various clinical tasks and questions. As a result, the standard data science workflow in medicine has been fundamentally altered; the foundation model lifecycle (FMLC) now includes distinct upstream and downstream processes, in which computational resources, model and data access, and decision-making power are distributed among multiple stakeholders. At their core, FMs are fundamentally statistical models, and this new workflow challenges the principles of Veridical Data Science (VDS), hindering the rigorous statistical analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies
