Improving Clinical Diagnosis with Counterfactual Multi-Agent Reasoning
Zhiwen You, Xi Chen, Aniket Vashishtha, Simo Du, Gabriel Erion-Barner, Hongyuan Mei, Hao Peng, Yue Guo

TL;DR
This paper introduces a counterfactual multi-agent framework for clinical diagnosis that enhances interpretability and accuracy by explicitly testing how evidence impacts differential diagnoses using counterfactual reasoning.
Contribution
It proposes a novel counterfactual case editing method and a quantitative measure called the Counterfactual Probability Gap to improve diagnostic reasoning in LLM-based systems.
Findings
Consistently improves diagnostic accuracy across benchmarks and LLMs.
Enhances interpretability and clinical usefulness of AI diagnostic reasoning.
Achieves larger gains in complex, ambiguous cases.
Abstract
Clinical diagnosis is a complex reasoning process in which clinicians gather evidence, form hypotheses, and test them against alternative explanations. In medical training, this reasoning is explicitly developed through counterfactual questioning--e.g., asking how a diagnosis would change if a key symptom were absent or altered--to strengthen differential diagnosis skills. As large language model (LLM)-based systems are increasingly used for diagnostic support, ensuring the interpretability of their recommendations becomes critical. However, most existing LLM-based diagnostic agents reason over fixed clinical evidence without explicitly testing how individual findings support or weaken competing diagnoses. In this work, we propose a counterfactual multi-agent diagnostic framework inspired by clinician training that makes hypothesis testing explicit and evidence-grounded. Our framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
