CausalGraph2LLM: Evaluating LLMs for Causal Queries

Ivaxi Sheth; Bahare Fatemi; Mario Fritz

arXiv:2410.15939·cs.CL·February 19, 2025

CausalGraph2LLM: Evaluating LLMs for Causal Queries

Ivaxi Sheth, Bahare Fatemi, Mario Fritz

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces CausalGraph2LLM, a large-scale benchmark for evaluating LLMs' causal reasoning abilities using over 700,000 queries, revealing their sensitivity to encoding and biases in causal graph tasks.

Contribution

The paper presents a new comprehensive benchmark for assessing LLMs' causal reasoning, highlighting their encoding sensitivity and bias tendencies in causal graph understanding.

Findings

01

LLMs show promise but are highly sensitive to encoding methods.

02

Even advanced models like GPT-4 exhibit significant sensitivity (~60%).

03

LLMs can display biases influenced by contextual information.

Abstract

Causality is essential in scientific research, enabling researchers to interpret true relationships between variables. These causal relationships are often represented by causal graphs, which are directed acyclic graphs. With the recent advancements in Large Language Models (LLMs), there is an increasing interest in exploring their capabilities in causal reasoning and their potential use to hypothesize causal graphs. These tasks necessitate the LLMs to encode the causal graph effectively for subsequent downstream tasks. In this paper, we introduce CausalGraph2LLM, a comprehensive benchmark comprising over 700k queries across diverse causal graph settings to evaluate the causal reasoning capabilities of LLMs. We categorize the causal queries into two types: graph-level and node-level queries. We benchmark both open-sourced and propriety models for our study. Our findings reveal that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ivaxi0s/causalgraph2llm
noneOfficial

Videos

CausalGraph2LLM: Evaluating LLMs for Causal Queries· underline

Taxonomy

TopicsData Quality and Management · Bayesian Modeling and Causal Inference · Semantic Web and Ontologies

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Multi-Head Attention · Adam · Dropout