Do large language models and humans have similar behaviors in causal   inference with script knowledge?

Xudong Hong; Margarita Ryzhova; Daniel Adrian Biondi; Vera Demberg

arXiv:2311.07311·cs.CL·November 14, 2023·1 cites

Do large language models and humans have similar behaviors in causal inference with script knowledge?

Xudong Hong, Margarita Ryzhova, Daniel Adrian Biondi, Vera Demberg

PDF

Open Access 1 Repo

TL;DR

This study compares human and large language model behaviors in causal inference within script-based stories, revealing that recent models partially mimic human responses but still struggle with integrating script knowledge.

Contribution

It provides a systematic comparison of human and LLM causal reasoning in script contexts, highlighting current models' limitations and partial alignment with human behavior.

Findings

01

Humans show longer reading times for causal conflicts.

02

Recent LLMs like GPT-3 correlate with human responses in some conditions.

03

All models fail to predict the lower surprise of no cause event.

Abstract

Recently, large pre-trained language models (LLMs) have demonstrated superior language understanding abilities, including zero-shot causal reasoning. However, it is unclear to what extent their capabilities are similar to human ones. We here study the processing of an event $B$ in a script-based story, which causally depends on a previous event $A$ . In our manipulation, event $A$ is stated, negated, or omitted in an earlier section of the text. We first conducted a self-paced reading experiment, which showed that humans exhibit significantly longer reading times when causal conflicts exist ( $\neg A \to B$ ) than under logical conditions ( $A \to B$ ). However, reading times remain similar when cause A is not explicitly mentioned, indicating that humans can easily infer event B from their script knowledge. We then tested a variety of LLMs on the same data to check to what…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tony-hong/causal-script
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsSparse Evolutionary Training · Refunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Residual Connection · Byte Pair Encoding · Dropout