How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer   Representations

Betty van Aken; Benjamin Winter; Alexander L\"oser; Felix A. Gers

arXiv:1909.04925·cs.CL·September 12, 2019·42 cites

How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations

Betty van Aken, Benjamin Winter, Alexander L\"oser, Felix A. Gers

PDF

Open Access 2 Repos

TL;DR

This paper investigates how BERT's internal layer-wise hidden states evolve during question answering tasks, revealing phases of transformation, task-specific information encoding, and the impact of fine-tuning on semantic understanding.

Contribution

It introduces a layer-wise analysis of BERT's hidden states for QA, emphasizing the informational value of hidden states beyond attention weights and providing new insights into BERT's reasoning process.

Findings

01

Transformations in BERT occur in phases related to traditional pipeline tasks.

02

Fine-tuning minimally affects BERT's semantic capabilities.

03

Prediction errors are detectable in early layer representations.

Abstract

Bidirectional Encoder Representations from Transformers (BERT) reach state-of-the-art results in a variety of Natural Language Processing tasks. However, understanding of their internal functioning is still insufficient and unsatisfactory. In order to better understand BERT and other Transformer-based models, we present a layer-wise analysis of BERT's hidden states. Unlike previous research, which mainly focuses on explaining Transformer models by their attention weights, we argue that hidden states contain equally valuable information. Specifically, our analysis focuses on models fine-tuned on the task of Question Answering (QA) as an example of a complex downstream task. We inspect how QA models transform token vectors in order to find the correct answer. To this end, we apply a set of general and QA-specific probing tasks that reveal the information stored in each representation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections