Leaving the barn door open for Clever Hans: Simple features predict LLM   benchmark answers

Lorenzo Pacchiardi; Marko Tesic; Lucy G. Cheke; Jos\'e; Hern\'andez-Orallo

arXiv:2410.11672·cs.CL·October 16, 2024

Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers

Lorenzo Pacchiardi, Marko Tesic, Lucy G. Cheke, Jos\'e, Hern\'andez-Orallo

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that simple $n$-gram features can predict benchmark answers and that LLMs may exploit these superficial cues, raising concerns about the internal validity of NLP benchmarks.

Contribution

It reveals how simple $n$-gram patterns can be used to predict benchmark labels and suggests LLMs might rely on these cues, questioning the benchmarks' validity.

Findings

01

Simple classifiers on $n$-grams achieve high accuracy on benchmarks.

02

Evidence that LLMs may use superficial $n$-gram patterns to solve tasks.

03

Highlights potential validity issues in current NLP benchmarks.

Abstract

The integrity of AI benchmarks is fundamental to accurately assess the capabilities of AI systems. The internal validity of these benchmarks - i.e., making sure they are free from confounding factors - is crucial for ensuring that they are measuring what they are designed to measure. In this paper, we explore a key issue related to internal validity: the possibility that AI systems can solve benchmarks in unintended ways, bypassing the capability being tested. This phenomenon, widely known in human and animal experiments, is often referred to as the 'Clever Hans' effect, where tasks are solved using spurious cues, often involving much simpler processes than those putatively assessed. Previous research suggests that language models can exhibit this behaviour as well. In several older Natural Language Processing (NLP) benchmarks, individual $n$ -grams like "not" have been found to be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Kinds-of-Intelligence-CFI/benchmark-ground-truth-predictability
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Natural Language Processing Techniques