VeriSim: A Configurable Framework for Evaluating Medical AI Under Realistic Patient Noise

Sina Mansouri; Mohit Marvania; Vibhavari Ashok Shihorkar; Han Ngoc Tran; Kazhal Shafiei; Mehrdad Fazli; Yikuan Li; Ziwei Zhu

arXiv:2604.10441·cs.AI·April 14, 2026

VeriSim: A Configurable Framework for Evaluating Medical AI Under Realistic Patient Noise

Sina Mansouri, Mohit Marvania, Vibhavari Ashok Shihorkar, Han Ngoc Tran, Kazhal Shafiei, Mehrdad Fazli, Yikuan Li, Ziwei Zhu

PDF

1 Repo

TL;DR

VeriSim is an open-source framework that injects realistic patient communication noise into medical AI evaluations, revealing significant model performance degradation under authentic clinical conditions.

Contribution

We introduce VeriSim, a novel patient simulation framework that systematically incorporates clinically grounded noise into medical AI assessments, highlighting robustness gaps.

Findings

01

All models' diagnostic accuracy drops 15-25% under realistic noise.

02

Smaller models (7B) degrade 40% more than larger models (70B+).

03

Medical fine-tuning offers limited robustness improvements.

Abstract

Medical large language models (LLMs) achieve impressive performance on standardized benchmarks, yet these evaluations fail to capture the complexity of real clinical encounters where patients exhibit memory gaps, limited health literacy, anxiety, and other communication barriers. We introduce VeriSim, a truth-preserving patient simulation framework that injects controllable, clinically evidence-grounded noise into patient responses while maintaining strict adherence to medical ground truth through a hybrid UMLS-LLM verification mechanism. Our framework operationalizes six noise dimensions derived from peer-reviewed medical communication literature, capturing authentic clinical phenomena such as patient recall limitations, health literacy barriers, and stigma-driven non-disclosure. Experiments across seven open-weight LLMs reveal that all models degrade significantly under realistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.