Capturing Failures of Large Language Models via Human Cognitive Biases

Erik Jones; Jacob Steinhardt

arXiv:2202.12299·cs.CL·November 28, 2022·36 cites

Capturing Failures of Large Language Models via Human Cognitive Biases

Erik Jones, Jacob Steinhardt

PDF

Open Access 1 Video

TL;DR

This paper introduces a framework inspired by human cognitive biases to identify and categorize qualitative errors in large language models, demonstrated through code generation case studies.

Contribution

It applies cognitive science methodologies to systematically reveal and analyze predictable error patterns in large language models.

Findings

01

Codex's errors are influenced by prompt framing

02

Outputs tend to anchor around initial inputs

03

Models are biased towards frequent training examples

Abstract

Large language models generate complex, open-ended outputs: instead of outputting a class label they write summaries, generate dialogue, or produce working code. In order to asses the reliability of these open-ended generation systems, we aim to identify qualitative categories of erroneous behavior, beyond identifying individual errors. To hypothesize and test for such qualitative errors, we draw inspiration from human cognitive biases -- systematic patterns of deviation from rational judgement. Specifically, we use cognitive biases as motivation to (i) generate hypotheses for problems that models may have, and (ii) develop experiments that elicit these problems. Using code generation as a case study, we find that OpenAI's Codex errs predictably based on how the input prompt is framed, adjusts outputs towards anchors, and is biased towards outputs that mimic frequent training examples.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Capturing Failures of Large Language Models via Human Cognitive Biases· slideslive

Taxonomy

TopicsTopic Modeling · Software Engineering Research · Natural Language Processing Techniques