Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges   in Large Language Models

Yu Yuan; Lili Zhao; Kai Zhang; Guangting Zheng; Qi Liu

arXiv:2410.13343·cs.CL·October 18, 2024

Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models

Yu Yuan, Lili Zhao, Kai Zhang, Guangting Zheng, Qi Liu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Shortcut Suite, a comprehensive evaluation framework that assesses how large language models rely on dataset shortcuts, revealing their tendencies, prompting effects, and overconfidence issues, with implications for improving robustness.

Contribution

The paper presents Shortcut Suite, a novel test suite for evaluating shortcut reliance in LLMs, including new metrics, prompting strategies, and extensive experimental insights.

Findings

01

Larger LLMs rely more on shortcuts in zero-shot and few-shot settings.

02

Chain-of-thought prompting reduces shortcut reliance.

03

LLMs often overconfident and have lower explanation quality on shortcut datasets.

Abstract

Large Language Models (LLMs) have shown remarkable capabilities in various natural language processing tasks. However, LLMs may rely on dataset biases as shortcuts for prediction, which can significantly impair their robustness and generalization capabilities. This paper presents Shortcut Suite, a comprehensive test suite designed to evaluate the impact of shortcuts on LLMs' performance, incorporating six shortcut types, five evaluation metrics, and four prompting strategies. Our extensive experiments yield several key findings: 1) LLMs demonstrate varying reliance on shortcuts for downstream tasks, significantly impairing their performance. 2) Larger LLMs are more likely to utilize shortcuts under zero-shot and few-shot in-context learning prompts. 3) Chain-of-thought prompting notably reduces shortcut reliance and outperforms other prompting strategies, while few-shot prompts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yyhappier/shortcutsuite
noneOfficial

Videos

Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification