The ICL Consistency Test

Lucas Weber; Elia Bruni; Dieuwke Hupkes

arXiv:2312.04945·cs.CL·December 11, 2023·1 cites

The ICL Consistency Test

Lucas Weber, Elia Bruni, Dieuwke Hupkes

PDF

Open Access 1 Datasets

TL;DR

The paper introduces the ICL consistency test, a benchmark to evaluate how consistently large language models perform across different setups using the same data, revealing a lack of robust generalisation.

Contribution

It presents a new benchmark and metric for assessing consistency in prompt-based models, highlighting their limitations in generalisation across varied setups.

Findings

01

All tested models show inconsistent predictions across setups.

02

The metric identifies properties that cause prediction instability.

03

Models lack robust generalisation according to the new consistency measure.

Abstract

Just like the previous generation of task-tuned models, large language models (LLMs) that are adapted to tasks via prompt-based methods like in-context-learning (ICL) perform well in some setups but not in others. This lack of consistency in prompt-based learning hints at a lack of robust generalisation. We here introduce the ICL consistency test -- a contribution to the GenBench collaborative benchmark task (CBT) -- which evaluates how consistent a model makes predictions across many different setups while using the same data. The test is based on different established natural language inference tasks. We provide preprocessed data constituting 96 different 'setups' and a metric that estimates model consistency across these setups. The metric is provided on a fine-grained level to understand what properties of a setup render predictions unstable and on an aggregated level to compare…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

LucasWeber/icl_consistency_test
dataset· 16 dl
16 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques