Towards Interpreting BERT for Reading Comprehension Based QA

Sahana Ramnath; Preksha Nema; Deep Sahni; Mitesh M. Khapra

arXiv:2010.08983·cs.CL·October 20, 2020

Towards Interpreting BERT for Reading Comprehension Based QA

Sahana Ramnath, Preksha Nema, Deep Sahni, Mitesh M. Khapra

PDF

1 Repo

TL;DR

This paper interprets BERT's internal mechanisms for reading comprehension QA, revealing how different layers contribute to understanding and answer prediction, especially for quantifier questions.

Contribution

It introduces a layer role definition using Integrated Gradients and provides a preliminary analysis of BERT's layer functions in RCQA.

Findings

01

Initial layers focus on query-passage interaction

02

Later layers emphasize contextual understanding and answer prediction

03

BERT correctly predicts answers for quantifier questions despite focusing on confusing words

Abstract

BERT and its variants have achieved state-of-the-art performance in various NLP tasks. Since then, various works have been proposed to analyze the linguistic information being captured in BERT. However, the current works do not provide an insight into how BERT is able to achieve near human-level performance on the task of Reading Comprehension based Question Answering. In this work, we attempt to interpret BERT for RCQA. Since BERT layers do not have predefined roles, we define a layer's role or functionality using Integrated Gradients. Based on the defined roles, we perform a preliminary analysis across all layers. We observed that the initial layers focus on query-passage interaction, whereas later layers focus more on contextual understanding and enhancing the answer prediction. Specifically for quantifier questions (how much/how many), we notice that BERT focuses on confusing words…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iitmnlp/BERT-Analysis-RCQA
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · WordPiece · Adam · Softmax · Layer Normalization · Dense Connections · Multi-Head Attention · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Linear Warmup With Linear Decay