CARL-GT: Evaluating Causal Reasoning Capabilities of Large Language Models
Ruibo Tu, Hedvig Kjellstr\"om, Gustav Eje Henter, Cheng Zhang

TL;DR
This paper introduces CARL-GT, a comprehensive benchmark for evaluating the causal reasoning capabilities of large language models using diverse graph and tabular data tasks, revealing current limitations in LLMs' reasoning skills.
Contribution
The paper presents a novel benchmark, CARL-GT, specifically designed to assess causal reasoning in LLMs across multiple real-world relevant tasks and analyzes their performance and task relationships.
Findings
LLMs are weak in causal reasoning, especially with tabular data.
Performance varies across different tasks and categories.
Tasks in different categories show stronger correlations than those within the same category.
Abstract
Causal reasoning capabilities are essential for large language models (LLMs) in a wide range of applications, such as education and healthcare. But there is still a lack of benchmarks for a better understanding of such capabilities. Current LLM benchmarks are mainly based on conversational tasks, academic math tests, and coding tests. Such benchmarks evaluate LLMs in well-regularized settings, but they are limited in assessing the skills and abilities to solve real-world problems. In this work, we provide a benchmark, named by CARL-GT, which evaluates CAusal Reasoning capabilities of large Language models using Graphs and Tabular data. The benchmark has a diverse range of tasks for evaluating LLMs from causal graph reasoning, knowledge discovery, and decision-making aspects. In addition, effective zero-shot learning prompts are developed for the tasks. In our experiments, we leverage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
