MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors

Xiaotian Luo; Xun Jiang; Jiangcheng Wu

arXiv:2604.06846·cs.CL·April 9, 2026

MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors

Xiaotian Luo, Xun Jiang, Jiangcheng Wu

PDF

TL;DR

MedDialBench is a comprehensive benchmark that systematically evaluates how different patient behaviors affect the diagnostic robustness of large language models in medical dialogues, revealing key vulnerabilities.

Contribution

It introduces a controlled, multi-dimensional framework for analyzing patient behavior effects on LLM diagnostic accuracy, enabling detailed sensitivity and interaction analysis.

Findings

01

Fabricating symptoms causes 1.7-3.4x larger accuracy drops than withholding information.

02

Fabricating is the only behavior with statistically significant impact across all models.

03

Fabricating interactions produce super-additive effects, worsening diagnostic failures.

Abstract

Interactive medical dialogue benchmarks have shown that LLM diagnostic accuracy degrades significantly when interacting with non-cooperative patients, yet existing approaches either apply adversarial behaviors without graded severity or case-specific grounding, or reduce patient non-cooperation to a single ungraded axis, and none analyze cross-dimension interactions. We introduce MedDialBench, a benchmark enabling controlled, dose-response characterization of how individual patient behavior dimensions affect LLM diagnostic robustness. It decomposes patient behavior into five dimensions -- Logic Consistency, Health Cognition, Expression Style, Disclosure, and Attitude -- each with graded severity levels and case-specific behavioral scripts. This controlled factorial design enables graded sensitivity analysis, dose-response profiling, and cross-dimension interaction detection.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.