Clinician input steers frontier AI models toward both accurate and harmful decisions
Ivan Lopez, Selin S. Everett, Bryan J. Bunning, April S. Liang, Dong Han Yao, Shivam C. Vedak, Kameron C. Black, Sophie Ostmeier, Stephen P. Ma, Emily Alsentzer, Jonathan H. Chen, Akshay S. Chaudhari, Eric Horvitz

TL;DR
This study evaluates how clinician input influences large language models in clinical settings, revealing improvements in diagnostic accuracy but also vulnerabilities to adversarial manipulation, emphasizing the need for safety measures.
Contribution
The paper introduces a comprehensive framework for assessing clinician-AI interactions, including new metrics and mitigation strategies to enhance safety and robustness of LLMs in healthcare.
Findings
Clinician input significantly improves model diagnostic accuracy.
Adversarial contexts can degrade model performance and induce harmful echoing.
Scaling inference and explicit uncertainty signals mitigate some risks.
Abstract
Large language models (LLMs) are entering clinician workflows, yet evaluations rarely measure how clinician reasoning shapes model behavior during clinical interactions. We combined 61 New England Journal of Medicine Case Records with 92 real-world clinician-AI interactions to evaluate 21 reasoning LLM variants across 8 frontier models on differential diagnosis generation and next step recommendations under three conditions: reasoning alone, after expert clinician context, and after adversarial clinician context. LLM-clinician concordance increased substantially after clinician exposure, with simulations sharing >=3 differential diagnosis items rising from 65.8% to 93.5% and >=3 next step recommendations from 20.3% to 53.8%. Expert context significantly improved correct final diagnosis inclusion across all 21 models (mean +20.4 percentage points), reflecting both reasoning improvement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare
