Com$^2$: A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language Models

Kai Xiong; Xiao Ding; Yixin Cao; Yuxiong Yan; Li Du; Yufei Zhang; Jinglong Gao; Jiaqian Liu; Bing Qin; Ting Liu

arXiv:2506.07064·cs.CL·June 10, 2025

Com$^2$: A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language Models

Kai Xiong, Xiao Ding, Yixin Cao, Yuxiong Yan, Li Du, Yufei Zhang, Jinglong Gao, Jiaqian Liu, Bing Qin, Ting Liu

PDF

Open Access 1 Video

TL;DR

This paper introduces Com$^2$, a benchmark for evaluating large language models on complex, implicit commonsense reasoning using causal event graphs and slow thinking, revealing current limitations and potential improvements.

Contribution

It presents a novel benchmark incorporating causal graphs and theory-based modifications to assess and enhance LLMs' complex commonsense reasoning capabilities.

Findings

01

LLMs struggle with reasoning depth and breadth in complex commonsense tasks.

02

Post-training and slow thinking improve LLM performance on the benchmark.

03

The benchmark provides a structured way to evaluate complex commonsense reasoning in LLMs.

Abstract

Large language models (LLMs) have mastered abundant simple and explicit commonsense knowledge through pre-training, enabling them to achieve human-like performance in simple commonsense reasoning. Nevertheless, LLMs struggle to reason with complex and implicit commonsense knowledge that is derived from simple ones (such as understanding the long-term effects of certain events), an aspect humans tend to focus on more. Existing works focus on complex tasks like math and code, while complex commonsense reasoning remains underexplored due to its uncertainty and lack of structure. To fill this gap and align with real-world concerns, we propose a benchmark Com $^{2}$ focusing on complex commonsense reasoning. We first incorporate causal event graphs to serve as structured complex commonsense. Then we adopt causal theory~(e.g., intervention) to modify the causal event graphs and obtain different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Com2 : A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language Models· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks