CommonWhy: A Dataset for Evaluating Entity-Based Causal Commonsense Reasoning in Large Language Models
Armin Toroghi, Faeze Moradi Kalarde, Scott Sanner

TL;DR
CommonWhy is a new dataset of 15,000 causal reasoning questions about entities, designed to evaluate and improve large language models' ability to perform entity-based commonsense reasoning and causal inference.
Contribution
The paper introduces CommonWhy, a novel dataset and benchmark for evaluating entity-based causal reasoning and knowledge graph question answering in LLMs.
Findings
State-of-the-art LLMs often hallucinate facts and fail in causal reasoning tasks.
CommonWhy reveals significant shortcomings of current models in entity-based causal reasoning.
The dataset enables evaluation of LLMs' ability to utilize knowledge graphs for causal inference.
Abstract
To effectively interact with the real world, Large Language Models (LLMs) require entity-based commonsense reasoning, a challenging task that necessitates integrating factual knowledge about specific entities with commonsense inference. Existing datasets for evaluating LLM entity-based commonsense reasoning have largely focused on True/False or multiple-choice questions, leaving the explicit assessment of the model's ability in abductive reasoning about causes and effects and generating explanations largely unexamined. In this work, we introduce CommonWhy, a dataset of 15,000 why questions designed to evaluate entity-based commonsense reasoning about causal relationships in LLMs. CommonWhy also serves as a Knowledge Graph Question Answering (KGQA) benchmark, as all supporting knowledge required to answer its queries is available in the Wikidata knowledge graph. Unlike existing KGQA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
