# From Balustrades to Pierre Vinken: Looking for Syntax in Transformer   Self-Attentions

**Authors:** David Mare\v{c}ek, Rudolf Rosa

arXiv: 1906.01958 · 2019-06-06

## TL;DR

This paper investigates the presence of syntactic structures in the self-attention mechanisms of Transformer NMT encoders across three languages, proposing a method to quantify and evaluate syntactic information in attention patterns.

## Contribution

It introduces a deterministic approach to measure syntactic content in self-attention and compares the derived structures to existing constituency treebanks.

## Key findings

- Attention heads often attend to sequences resembling syntactic phrases
- The proposed method effectively quantifies syntactic information in attention patterns
- Attention-derived trees show reasonable alignment with manual constituency trees

## Abstract

We inspect the multi-head self-attention in Transformer NMT encoders for three source languages, looking for patterns that could have a syntactic interpretation. In many of the attention heads, we frequently find sequences of consecutive states attending to the same position, which resemble syntactic phrases. We propose a transparent deterministic method of quantifying the amount of syntactic information present in the self-attentions, based on automatically building and evaluating phrase-structure trees from the phrase-like sequences. We compare the resulting trees to existing constituency treebanks, both manually and by computing precision and recall.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.01958/full.md

## Figures

114 figures with captions in the complete paper: https://tomesphere.com/paper/1906.01958/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/1906.01958/full.md

---
Source: https://tomesphere.com/paper/1906.01958