Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models
Yu Yuan, Lili Zhao, Kai Zhang, Guangting Zheng, Qi Liu

TL;DR
This paper introduces Shortcut Suite, a comprehensive evaluation framework that assesses how large language models rely on dataset shortcuts, revealing their tendencies, prompting effects, and overconfidence issues, with implications for improving robustness.
Contribution
The paper presents Shortcut Suite, a novel test suite for evaluating shortcut reliance in LLMs, including new metrics, prompting strategies, and extensive experimental insights.
Findings
Larger LLMs rely more on shortcuts in zero-shot and few-shot settings.
Chain-of-thought prompting reduces shortcut reliance.
LLMs often overconfident and have lower explanation quality on shortcut datasets.
Abstract
Large Language Models (LLMs) have shown remarkable capabilities in various natural language processing tasks. However, LLMs may rely on dataset biases as shortcuts for prediction, which can significantly impair their robustness and generalization capabilities. This paper presents Shortcut Suite, a comprehensive test suite designed to evaluate the impact of shortcuts on LLMs' performance, incorporating six shortcut types, five evaluation metrics, and four prompting strategies. Our extensive experiments yield several key findings: 1) LLMs demonstrate varying reliance on shortcuts for downstream tasks, significantly impairing their performance. 2) Larger LLMs are more likely to utilize shortcuts under zero-shot and few-shot in-context learning prompts. 3) Chain-of-thought prompting notably reduces shortcut reliance and outperforms other prompting strategies, while few-shot prompts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
