Shortcut Learning of Large Language Models in Natural Language   Understanding

Mengnan Du; Fengxiang He; Na Zou; Dacheng Tao; Xia Hu

arXiv:2208.11857·cs.CL·May 9, 2023·23 cites

Shortcut Learning of Large Language Models in Natural Language Understanding

Mengnan Du, Fengxiang He, Na Zou, Dacheng Tao, Xia Hu

PDF

Open Access

TL;DR

This paper reviews how large language models often rely on dataset biases as shortcuts, affecting their robustness, and discusses methods to identify, understand, and mitigate this shortcut learning behavior.

Contribution

It provides a comprehensive overview of recent methods to detect, analyze, and address shortcut learning in large language models, highlighting future research directions.

Findings

01

Identification methods for shortcut learning behaviors

02

Analysis of causes behind shortcut reliance

03

Discussion of mitigation strategies and challenges

Abstract

Large language models (LLMs) have achieved state-of-the-art performance on a series of natural language understanding tasks. However, these LLMs might rely on dataset bias and artifacts as shortcuts for prediction. This has significantly affected their generalizability and adversarial robustness. In this paper, we provide a review of recent developments that address the shortcut learning and robustness challenge of LLMs. We first introduce the concepts of shortcut learning of language models. We then introduce methods to identify shortcut learning behavior in language models, characterize the reasons for shortcut learning, as well as introduce mitigation solutions. Finally, we discuss key research challenges and potential research directions in order to advance the field of LLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications