When Choosing Plausible Alternatives, Clever Hans can be Clever
Pride Kavumba, Naoya Inoue, Benjamin Heinzerling, Keshav Singh, Paul, Reisert, Kentaro Inui

TL;DR
This paper investigates whether BERT and RoBERTa's high performance on COPA is due to superficial cues and introduces Balanced COPA to evaluate models without these cues, revealing different reliance patterns.
Contribution
The paper identifies superficial cues in COPA, introduces Balanced COPA to mitigate these cues, and analyzes model reliance, highlighting differences between BERT and RoBERTa.
Findings
BERT exploits superficial cues in COPA but learns the task when cues are removed.
RoBERTa does not rely on superficial cues in COPA.
Balanced COPA reduces superficial cue exploitation, providing a fairer evaluation.
Abstract
Pretrained language models, such as BERT and RoBERTa, have shown large improvements in the commonsense reasoning benchmark COPA. However, recent work found that many improvements in benchmarks of natural language understanding are not due to models learning the task, but due to their increasing ability to exploit superficial cues, such as tokens that occur more often in the correct answer than the wrong one. Are BERT's and RoBERTa's good performance on COPA also caused by this? We find superficial cues in COPA, as well as evidence that BERT exploits these cues. To remedy this problem, we introduce Balanced COPA, an extension of COPA that does not suffer from easy-to-exploit single token cues. We analyze BERT's and RoBERTa's performance on original and Balanced COPA, finding that BERT relies on superficial cues when they are present, but still achieves comparable performance once they…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · RoBERTa · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece
