COLD: Causal reasOning in cLosed Daily activities

Abhinav Joshi; Areeb Ahmad; Ashutosh Modi

arXiv:2411.19500·cs.CL·December 2, 2024

COLD: Causal reasOning in cLosed Daily activities

Abhinav Joshi, Areeb Ahmad, Ashutosh Modi

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces COLD, a framework for causal reasoning in daily activities that enables large-scale query generation and evaluates LLMs' understanding of real-world causal relationships, revealing their current limitations.

Contribution

The paper presents a novel framework for causal reasoning in daily activities, bridging the gap between real-world grounding and theoretical analysis, and provides a large dataset for evaluating LLMs.

Findings

01

LLMs struggle with causal reasoning in trivial daily activities.

02

COLD generates nearly 9 million causal queries for evaluation.

03

Causal reasoning remains challenging for current LLMs.

Abstract

Large Language Models (LLMs) have shown state-of-the-art performance in a variety of tasks, including arithmetic and reasoning; however, to gauge the intellectual capabilities of LLMs, causal reasoning has become a reliable proxy for validating a general understanding of the mechanics and intricacies of the world similar to humans. Previous works in natural language processing (NLP) have either focused on open-ended causal reasoning via causal commonsense reasoning (CCR) or framed a symbolic representation-based question answering for theoretically backed-up analysis via a causal inference engine. The former adds an advantage of real-world grounding but lacks theoretically backed-up analysis/validation, whereas the latter is far from real-world grounding. In this work, we bridge this gap by proposing the COLD (Causal reasOning in cLosed Daily activities) framework, which is built upon…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Exploration-Lab/COLD
pytorchOfficial

Datasets

Exploration-Lab/COLD
dataset· 8 dl
8 dl

Videos

COLD: Causal reasOning in cLosed Daily activities· slideslive

Taxonomy

TopicsFormal Methods in Verification

MethodsCausal inference