AI-generated data contamination erodes pathological variability and diagnostic reliability

Hongyu He; Shaowen Xiang; Ye Zhang; Yingtao Zhu; Jin Zhang; Hao Deng; Emily Alsentzer; Yun Liu; Qingyu Chen; Kun-Hsing Yu; Andrew Marshall; Tingting Chen; Srinivas Anumasa; Daniel Ebner; Dean Ho; Kee Yuan Ngiam; Ching-Yu Cheng; and Dianbo Liu

arXiv:2601.12946·cs.CY·February 3, 2026

AI-generated data contamination erodes pathological variability and diagnostic reliability

Hongyu He, Shaowen Xiang, Ye Zhang, Yingtao Zhu, Jin Zhang, Hao Deng, Emily Alsentzer, Yun Liu, Qingyu Chen, Kun-Hsing Yu, Andrew Marshall, Tingting Chen, Srinivas Anumasa, Daniel Ebner, Dean Ho, Kee Yuan Ngiam, Ching-Yu Cheng, and Dianbo Liu

PDF

Open Access

TL;DR

This study demonstrates that unchecked AI-generated medical data leads to loss of critical pathological variability and diagnostic reliability, with models converging to generic phenotypes and masking errors, risking clinical safety.

Contribution

It reveals the rapid erosion of diagnostic detail in AI-generated medical data and evaluates mitigation strategies, emphasizing the need for human oversight.

Findings

01

AI models lose rare pathological findings over generations

02

Diagnostic confidence increases despite declining accuracy

03

Mixing real data with filtering preserves data diversity

Abstract

Generative artificial intelligence (AI) is rapidly populating medical records with synthetic content, creating a feedback loop where future models are increasingly at risk of training on uncurated AI-generated data. However, the clinical consequences of this AI-generated data contamination remain unexplored. Here, we show that in the absence of mandatory human verification, this self-referential cycle drives a rapid erosion of pathological variability and diagnostic reliability. By analysing more than 800,000 synthetic data points across clinical text generation, vision-language reporting, and medical image synthesis, we find that models progressively converge toward generic phenotypes regardless of the model architecture. Specifically, rare but critical findings, including pneumothorax and effusions, vanish from the synthetic content generated by AI models, while demographic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · AI in cancer detection · Machine Learning in Healthcare