# What Does BERT Look At? An Analysis of BERT's Attention

**Authors:** Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D. Manning

arXiv: 1906.04341 · 2019-06-12

## TL;DR

This paper analyzes BERT's attention mechanisms, revealing how different heads focus on linguistic features like syntax and coreference, and introduces methods to interpret these attention patterns.

## Contribution

It presents novel methods for analyzing BERT's attention and demonstrates that attention heads encode significant syntactic and coreferential information.

## Key findings

- Attention heads attend to delimiters and positional patterns
- Certain heads align with syntactic roles like objects and determiners
- Attention-based probing reveals substantial syntactic information in BERT

## Abstract

Large pre-trained neural networks such as BERT have had great recent success in NLP, motivating a growing body of research investigating what aspects of language they are able to learn from unlabeled data. Most recent analysis has focused on model outputs (e.g., language model surprisal) or internal vector representations (e.g., probing classifiers). Complementary to these works, we propose methods for analyzing the attention mechanisms of pre-trained models and apply them to BERT. BERT's attention heads exhibit patterns such as attending to delimiter tokens, specific positional offsets, or broadly attending over the whole sentence, with heads in the same layer often exhibiting similar behaviors. We further show that certain attention heads correspond well to linguistic notions of syntax and coreference. For example, we find heads that attend to the direct objects of verbs, determiners of nouns, objects of prepositions, and coreferent mentions with remarkably high accuracy. Lastly, we propose an attention-based probing classifier and use it to further demonstrate that substantial syntactic information is captured in BERT's attention.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.04341/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1906.04341/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/1906.04341/full.md

---
Source: https://tomesphere.com/paper/1906.04341