What does BERT Learn from Multiple-Choice Reading Comprehension   Datasets?

Chenglei Si; Shuohang Wang; Min-Yen Kan; Jing Jiang

arXiv:1910.12391·cs.CL·October 29, 2019·31 cites

What does BERT Learn from Multiple-Choice Reading Comprehension Datasets?

Chenglei Si, Shuohang Wang, Min-Yen Kan, Jing Jiang

PDF

Open Access

TL;DR

This paper investigates what BERT learns from multiple-choice reading comprehension datasets, revealing that it relies on keywords and dataset artifacts rather than true understanding or reasoning skills.

Contribution

The study introduces methods to analyze BERT's learning process on MCRC datasets, highlighting its reliance on superficial cues and dataset artifacts instead of semantic comprehension.

Findings

01

BERT mainly learns keyword associations for correct answers.

02

BERT can perform well even with partial or shuffled input.

03

Datasets contain artifacts that allow solutions without full context.

Abstract

Multiple-Choice Reading Comprehension (MCRC) requires the model to read the passage and question, and select the correct answer among the given options. Recent state-of-the-art models have achieved impressive performance on multiple MCRC datasets. However, such performance may not reflect the model's true ability of language understanding and reasoning. In this work, we adopt two approaches to investigate what BERT learns from MCRC datasets: 1) an un-readable data attack, in which we add keywords to confuse BERT, leading to a significant performance drop; and 2) an un-answerable data training, in which we train BERT on partial or shuffled input. Under un-answerable data training, BERT achieves unexpectedly high performance. Based on our experiments on the 5 key MCRC datasets - RACE, MCTest, MCScript, MCScript2.0, DREAM - we observe that 1) fine-tuned BERT mainly learns how keywords lead…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax