Improving Clinical Diagnosis with Counterfactual Multi-Agent Reasoning

Zhiwen You; Xi Chen; Aniket Vashishtha; Simo Du; Gabriel Erion-Barner; Hongyuan Mei; Hao Peng; Yue Guo

arXiv:2603.27820·cs.CL·April 24, 2026

Improving Clinical Diagnosis with Counterfactual Multi-Agent Reasoning

Zhiwen You, Xi Chen, Aniket Vashishtha, Simo Du, Gabriel Erion-Barner, Hongyuan Mei, Hao Peng, Yue Guo

PDF

TL;DR

This paper introduces a counterfactual multi-agent framework for clinical diagnosis that enhances interpretability and accuracy by explicitly testing how evidence impacts differential diagnoses using counterfactual reasoning.

Contribution

It proposes a novel counterfactual case editing method and a quantitative measure called the Counterfactual Probability Gap to improve diagnostic reasoning in LLM-based systems.

Findings

01

Consistently improves diagnostic accuracy across benchmarks and LLMs.

02

Enhances interpretability and clinical usefulness of AI diagnostic reasoning.

03

Achieves larger gains in complex, ambiguous cases.

Abstract

Clinical diagnosis is a complex reasoning process in which clinicians gather evidence, form hypotheses, and test them against alternative explanations. In medical training, this reasoning is explicitly developed through counterfactual questioning--e.g., asking how a diagnosis would change if a key symptom were absent or altered--to strengthen differential diagnosis skills. As large language model (LLM)-based systems are increasingly used for diagnostic support, ensuring the interpretability of their recommendations becomes critical. However, most existing LLM-based diagnostic agents reason over fixed clinical evidence without explicitly testing how individual findings support or weaken competing diagnoses. In this work, we propose a counterfactual multi-agent diagnostic framework inspired by clinician training that makes hypothesis testing explicit and evidence-grounded. Our framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.