COLD: Causal reasOning in cLosed Daily activities
Abhinav Joshi, Areeb Ahmad, Ashutosh Modi

TL;DR
This paper introduces COLD, a framework for causal reasoning in daily activities that enables large-scale query generation and evaluates LLMs' understanding of real-world causal relationships, revealing their current limitations.
Contribution
The paper presents a novel framework for causal reasoning in daily activities, bridging the gap between real-world grounding and theoretical analysis, and provides a large dataset for evaluating LLMs.
Findings
LLMs struggle with causal reasoning in trivial daily activities.
COLD generates nearly 9 million causal queries for evaluation.
Causal reasoning remains challenging for current LLMs.
Abstract
Large Language Models (LLMs) have shown state-of-the-art performance in a variety of tasks, including arithmetic and reasoning; however, to gauge the intellectual capabilities of LLMs, causal reasoning has become a reliable proxy for validating a general understanding of the mechanics and intricacies of the world similar to humans. Previous works in natural language processing (NLP) have either focused on open-ended causal reasoning via causal commonsense reasoning (CCR) or framed a symbolic representation-based question answering for theoretically backed-up analysis via a causal inference engine. The former adds an advantage of real-world grounding but lacks theoretically backed-up analysis/validation, whereas the latter is far from real-world grounding. In this work, we bridge this gap by proposing the COLD (Causal reasOning in cLosed Daily activities) framework, which is built upon…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsFormal Methods in Verification
MethodsCausal inference
