Perspective on Bias in Biomedical AI: Preventing Downstream Healthcare Disparities

Michal Rosen-Zvi; Yoav Kan-Tor; Michael Danziger; Agata Ferretti; Javier Aula-Blasco; Julia Falcao; Ron Shamir; Mordechai Muszkat

arXiv:2604.14514·cs.AI·April 17, 2026

Perspective on Bias in Biomedical AI: Preventing Downstream Healthcare Disparities

Michal Rosen-Zvi, Yoav Kan-Tor, Michael Danziger, Agata Ferretti, Javier Aula-Blasco, Julia Falcao, Ron Shamir, Mordechai Muszkat

PDF

TL;DR

This paper highlights early-stage biases in biomedical data collection, especially in omics datasets, which can perpetuate healthcare disparities through foundation models.

Contribution

It reveals the extent of demographic biases in omics research and proposes principles for improving equity and transparency in biomedical AI development.

Findings

01

Limited reporting of ancestry in omics publications

02

European-ancestry data dominates large biomedical datasets

03

Biases in foundational models risk perpetuating health disparities

Abstract

Healthcare disparities persist across socioeconomic boundaries, often attributed to unequal access to screening, diagnostics, and therapeutics. However, this perspective highlights that critical biases can emerge much earlier, during data collection and research prioritization, long before clinical implementation in cases where the focus of the studies and the data that is collected is at the molecular level. A vast number of studies focus on collecting omics data but the demographic information associated with these datasets is often not reported in the studies, and when it is reported, it shows big biases. An automated analysis of 4719 PubMed-indexed omics publications from 2015 to 2024 reveals that only a small fraction report ancestry or ethnicity information, with ancestry reporting improving slightly. Analysis of large-scale datasets commonly used for model training, such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.