Which Shortcut Solution Do Question Answering Models Prefer to Learn?
Kazutoshi Shinoda, Saku Sugawara, Akiko Aizawa

TL;DR
This paper investigates the learnability of shortcut solutions in question answering models, revealing how certain shortcuts are preferentially learned and how this knowledge can improve training strategies to enhance model robustness.
Contribution
The study analyzes the learnability of shortcuts in QA models, linking it to loss landscape characteristics and proposing a method to optimize training set composition based on shortcut learnability.
Findings
Shortcut solutions exploiting answer positions and word-label correlations are preferentially learned.
More learnable shortcuts correspond to flatter, deeper loss landscapes.
Utilizing shortcut learnability reduces the need for anti-shortcut examples in training.
Abstract
Question answering (QA) models for reading comprehension tend to learn shortcut solutions rather than the solutions intended by QA datasets. QA models that have learned shortcut solutions can achieve human-level performance in shortcut examples where shortcuts are valid, but these same behaviors degrade generalization potential on anti-shortcut examples where shortcuts are invalid. Various methods have been proposed to mitigate this problem, but they do not fully take the characteristics of shortcuts themselves into account. We assume that the learnability of shortcuts, i.e., how easy it is to learn a shortcut, is useful to mitigate the problem. Thus, we first examine the learnability of the representative shortcuts on extractive and multiple-choice QA datasets. Behavioral tests using biased training sets reveal that shortcuts that exploit answer positions and word-label correlations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
