What does BERT Learn from Multiple-Choice Reading Comprehension Datasets?
Chenglei Si, Shuohang Wang, Min-Yen Kan, Jing Jiang

TL;DR
This paper investigates what BERT learns from multiple-choice reading comprehension datasets, revealing that it relies on keywords and dataset artifacts rather than true understanding or reasoning skills.
Contribution
The study introduces methods to analyze BERT's learning process on MCRC datasets, highlighting its reliance on superficial cues and dataset artifacts instead of semantic comprehension.
Findings
BERT mainly learns keyword associations for correct answers.
BERT can perform well even with partial or shuffled input.
Datasets contain artifacts that allow solutions without full context.
Abstract
Multiple-Choice Reading Comprehension (MCRC) requires the model to read the passage and question, and select the correct answer among the given options. Recent state-of-the-art models have achieved impressive performance on multiple MCRC datasets. However, such performance may not reflect the model's true ability of language understanding and reasoning. In this work, we adopt two approaches to investigate what BERT learns from MCRC datasets: 1) an un-readable data attack, in which we add keywords to confuse BERT, leading to a significant performance drop; and 2) an un-answerable data training, in which we train BERT on partial or shuffled input. Under un-answerable data training, BERT achieves unexpectedly high performance. Based on our experiments on the 5 key MCRC datasets - RACE, MCTest, MCScript, MCScript2.0, DREAM - we observe that 1) fine-tuned BERT mainly learns how keywords lead…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax
