MedEinst: Benchmarking the Einstellung Effect in Medical LLMs through Counterfactual Differential Diagnosis

Wenting Chen; Zhongrui Zhu; Guolin Huang; Wenxuan Wang

arXiv:2601.06636·cs.CL·January 13, 2026

MedEinst: Benchmarking the Einstellung Effect in Medical LLMs through Counterfactual Differential Diagnosis

Wenting Chen, Zhongrui Zhu, Guolin Huang, Wenxuan Wang

PDF

Open Access

TL;DR

MedEinst introduces a counterfactual benchmark to evaluate how medical LLMs rely on shortcuts instead of evidence, revealing their susceptibility to the Einstellung Effect in atypical diagnoses.

Contribution

This work presents MedEinst, a novel benchmark with paired cases to detect bias traps, and proposes ECR-Agent, a reasoning system aligned with Evidence-Based Medicine standards.

Findings

01

Frontier models have high accuracy but high bias trap rates.

02

Existing benchmarks fail to detect the Einstellung Effect.

03

ECR-Agent improves reasoning fidelity in medical diagnosis.

Abstract

Despite achieving high accuracy on medical benchmarks, LLMs exhibit the Einstellung Effect in clinical diagnosis--relying on statistical shortcuts rather than patient-specific evidence, causing misdiagnosis in atypical cases. Existing benchmarks fail to detect this critical failure mode. We introduce MedEinst, a counterfactual benchmark with 5,383 paired clinical cases across 49 diseases. Each pair contains a control case and a "trap" case with altered discriminative evidence that flips the diagnosis. We measure susceptibility via Bias Trap Rate--probability of misdiagnosing traps despite correctly diagnosing controls. Extensive Evaluation of 17 LLMs shows frontier models achieve high baseline accuracy but severe bias trap rates. Thus, we propose ECR-Agent, aligning LLM reasoning with Evidence-Based Medicine standard via two components: (1) Dynamic Causal Inference (DCI) performs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks