CRAwDAD: Causal Reasoning Augmentation with Dual-Agent Debate
Finn G. Vamosi, Nils D. Forkert

TL;DR
This paper introduces CRAwDAD, a dual-agent debate framework for causal reasoning in language models, significantly improving accuracy in causal inference tasks by enabling models to critique and refine each other's hypotheses.
Contribution
It proposes a novel multi-agent debate approach that explicitly models internal causal reasoning as an adversarial dialogue, enhancing the performance of reasoning language models.
Findings
Debate improves DeepSeek-R1 accuracy from 78.03% to 87.45%.
Debate enhances Qwen3 accuracy from 84.16% to 89.41%.
Counterfactual reasoning accuracy notably increases with debate.
Abstract
When people reason about cause and effect, they often consider many competing "what if" scenarios before deciding which explanation fits best. Analogously, advanced language models capable of causal inference can consider multiple interventions and counterfactuals to judge the validity of causal claims. Crucially, this type of reasoning is less like a single calculation and more like an internal dialogue between alternative hypotheses. In this paper, we make this dialogue explicit through a dual-agent debate framework where one model provides a structured causal inference, and the other critically examines this reasoning for logical flaws. When disagreements arise, the agents attempt to persuade each other, challenging each other's logic and revising their conclusions until they converge on a mutually agreed answer. To take advantage of this deliberative process, we specifically use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Bayesian Modeling and Causal Inference · Topic Modeling
