Large Language Models Can Be Easily Distracted by Irrelevant Context
Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed, Chi, Nathanael Sch\"arli, Denny Zhou

TL;DR
This paper examines how large language models are easily distracted by irrelevant information, significantly reducing their accuracy, and proposes methods to mitigate this issue.
Contribution
It introduces the GSM-IC benchmark to measure distractibility and evaluates strategies to improve model robustness against irrelevant context.
Findings
Model performance drops with irrelevant information
Self-consistency decoding improves robustness
Explicit instructions help models ignore irrelevant data
Abstract
Large language models have achieved impressive performance on various natural language processing tasks. However, so far they have been evaluated primarily on benchmarks where all information in the input context is relevant for solving the task. In this work, we investigate the distractibility of large language models, i.e., how the model problem-solving accuracy can be influenced by irrelevant context. In particular, we introduce Grade-School Math with Irrelevant Context (GSM-IC), an arithmetic reasoning dataset with irrelevant information in the problem description. We use this benchmark to measure the distractibility of cutting-edge prompting techniques for large language models, and find that the model performance is dramatically decreased when irrelevant information is included. We also identify several approaches for mitigating this deficiency, such as decoding with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Intelligent Tutoring Systems and Adaptive Learning
