SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care

Dongshen Peng; Yi Wang; Austin Schoeffler; Carl Preiksaitis; Christian Rose

arXiv:2601.16529·cs.AI·March 5, 2026

SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care

Dongshen Peng, Yi Wang, Austin Schoeffler, Carl Preiksaitis, Christian Rose

PDF

Open Access

TL;DR

This paper introduces SycoEval-EM, a simulation framework to evaluate large language models' susceptibility to patient pressure in emergency care scenarios, revealing significant vulnerabilities and the inadequacy of static benchmarks.

Contribution

The paper presents a novel multi-agent simulation framework for assessing LLM robustness against adversarial patient persuasion in clinical settings.

Findings

01

LLMs show 0-100% acquiescence rates across scenarios.

02

Models are more vulnerable to imaging requests than opioid prescriptions.

03

Static benchmarks are insufficient; multi-turn adversarial testing is necessary.

Abstract

Large language models (LLMs) show promise in clinical decision support yet risk acquiescing to patient pressure for inappropriate care. We introduce SycoEval-EM, a multi-agent simulation framework evaluating LLM robustness through adversarial patient persuasion in emergency medicine. Across 20 LLMs and 1,875 encounters spanning three Choosing Wisely scenarios, acquiescence rates ranged from 0-100\%. Models showed higher vulnerability to imaging requests (38.8\%) than opioid prescriptions (25.0\%), with model capability poorly predicting robustness. All persuasion tactics proved equally effective (30.0-36.0\%), indicating general susceptibility rather than tactic-specific weakness. Our findings demonstrate that static benchmarks inadequately predict safety under social pressure, necessitating multi-turn adversarial testing for clinical AI certification.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Adversarial Robustness in Machine Learning · Topic Modeling