A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading Comprehension
Jie Cai, Zhengzhou Zhu, Ping Nie, Qian Liu

TL;DR
This paper introduces a pairwise probing method to analyze how fine-tuning BERT affects its semantic and reasoning capabilities in machine reading comprehension, revealing that fine-tuning mainly enhances higher-layer representations.
Contribution
The paper proposes a novel pairwise probe for quantitatively comparing pre-trained and fine-tuned BERT layers in MRC tasks, addressing issues of distraction from training noise.
Findings
Fine-tuning minimally impacts low-level and semantic information.
Fine-tuned BERT shows significant improvements in specific reasoning abilities.
Enhancements are most evident after the fifth layer.
Abstract
Pre-trained models have brought significant improvements to many NLP tasks and have been extensively analyzed. But little is known about the effect of fine-tuning on specific tasks. Intuitively, people may agree that a pre-trained model already learns semantic representations of words (e.g. synonyms are closer to each other) and fine-tuning further improves its capabilities which require more complicated reasoning (e.g. coreference resolution, entity boundary detection, etc). However, how to verify these arguments analytically and quantitatively is a challenging task and there are few works focus on this topic. In this paper, inspired by the observation that most probing tasks involve identifying matched pairs of phrases (e.g. coreference requires matching an entity and a pronoun), we propose a pairwise probe to understand BERT fine-tuning on the machine reading comprehension (MRC)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Weight Decay · Softmax · Adam · Multi-Head Attention · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Linear Warmup With Linear Decay · Dense Connections
