Better Think Thrice: Learning to Reason Causally with Double Counterfactual Consistency

Victoria Lin; Xinnuo Xu; Rachel Lawrence; Risa Ueno; Amit Sharma; Javier Gonzalez; Niranjani Prasad

arXiv:2602.16787·cs.LG·February 20, 2026

Better Think Thrice: Learning to Reason Causally with Double Counterfactual Consistency

Victoria Lin, Xinnuo Xu, Rachel Lawrence, Risa Ueno, Amit Sharma, Javier Gonzalez, Niranjani Prasad

PDF

Open Access

TL;DR

This paper introduces double counterfactual consistency (DCC), a lightweight, training-free method for evaluating and improving the causal reasoning abilities of large language models without needing labeled counterfactual data.

Contribution

The paper proposes DCC, a novel inference-time technique that assesses and enhances LLMs' causal reasoning by verifying causal intervention and counterfactual prediction capabilities.

Findings

01

DCC effectively evaluates LLMs' causal reasoning across various tasks.

02

Using DCC as a rejection criterion improves model performance.

03

DCC operates without requiring labeled counterfactual datasets.

Abstract

Despite their strong performance on reasoning benchmarks, large language models (LLMs) have proven brittle when presented with counterfactual questions, suggesting weaknesses in their causal reasoning ability. While recent work has demonstrated that labeled counterfactual tasks can be useful benchmarks of LLMs' causal reasoning, producing such data at the scale required to cover the vast potential space of counterfactuals is limited. In this work, we introduce double counterfactual consistency (DCC), a lightweight inference-time method for measuring and guiding the ability of LLMs to reason causally. Without requiring labeled counterfactual data, DCC verifies a model's ability to execute two important elements of causal reasoning: causal intervention and counterfactual prediction. Using DCC, we evaluate the causal reasoning abilities of various leading LLMs across a range of reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Bayesian Modeling and Causal Inference