CausalGraph2LLM: Evaluating LLMs for Causal Queries
Ivaxi Sheth, Bahare Fatemi, Mario Fritz

TL;DR
This paper introduces CausalGraph2LLM, a large-scale benchmark for evaluating LLMs' causal reasoning abilities using over 700,000 queries, revealing their sensitivity to encoding and biases in causal graph tasks.
Contribution
The paper presents a new comprehensive benchmark for assessing LLMs' causal reasoning, highlighting their encoding sensitivity and bias tendencies in causal graph understanding.
Findings
LLMs show promise but are highly sensitive to encoding methods.
Even advanced models like GPT-4 exhibit significant sensitivity (~60%).
LLMs can display biases influenced by contextual information.
Abstract
Causality is essential in scientific research, enabling researchers to interpret true relationships between variables. These causal relationships are often represented by causal graphs, which are directed acyclic graphs. With the recent advancements in Large Language Models (LLMs), there is an increasing interest in exploring their capabilities in causal reasoning and their potential use to hypothesize causal graphs. These tasks necessitate the LLMs to encode the causal graph effectively for subsequent downstream tasks. In this paper, we introduce CausalGraph2LLM, a comprehensive benchmark comprising over 700k queries across diverse causal graph settings to evaluate the causal reasoning capabilities of LLMs. We categorize the causal queries into two types: graph-level and node-level queries. We benchmark both open-sourced and propriety models for our study. Our findings reveal that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsData Quality and Management · Bayesian Modeling and Causal Inference · Semantic Web and Ontologies
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Multi-Head Attention · Adam · Dropout
