How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations
Betty van Aken, Benjamin Winter, Alexander L\"oser, Felix A. Gers

TL;DR
This paper investigates how BERT's internal layer-wise hidden states evolve during question answering tasks, revealing phases of transformation, task-specific information encoding, and the impact of fine-tuning on semantic understanding.
Contribution
It introduces a layer-wise analysis of BERT's hidden states for QA, emphasizing the informational value of hidden states beyond attention weights and providing new insights into BERT's reasoning process.
Findings
Transformations in BERT occur in phases related to traditional pipeline tasks.
Fine-tuning minimally affects BERT's semantic capabilities.
Prediction errors are detectable in early layer representations.
Abstract
Bidirectional Encoder Representations from Transformers (BERT) reach state-of-the-art results in a variety of Natural Language Processing tasks. However, understanding of their internal functioning is still insufficient and unsatisfactory. In order to better understand BERT and other Transformer-based models, we present a layer-wise analysis of BERT's hidden states. Unlike previous research, which mainly focuses on explaining Transformer models by their attention weights, we argue that hidden states contain equally valuable information. Specifically, our analysis focuses on models fine-tuned on the task of Question Answering (QA) as an example of a complex downstream task. We inspect how QA models transform token vectors in order to find the correct answer. To this end, we apply a set of general and QA-specific probing tasks that reveal the information stored in each representation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections
