The Architectural Bottleneck Principle
Tiago Pimentel, Josef Valvoda, Niklas Stoehr, Ryan Cotterell

TL;DR
This paper introduces the architectural bottleneck principle, a new probing method that estimates the information extractable by neural network components by matching the probe's structure to the component, revealing models' access to syntactic info.
Contribution
The paper proposes the architectural bottleneck principle and a novel attentional probe that mimics transformer components to measure accessible syntactic information.
Findings
Transformers can access syntactic information through their attention heads.
The attentional probe successfully estimates the amount of syntactic info available.
Models like BERT, ALBERT, and RoBERTa retain syntactic information in their representations.
Abstract
In this paper, we seek to measure how much information a component in a neural network could extract from the representations fed into it. Our work stands in contrast to prior probing work, most of which investigates how much information a model's representations contain. This shift in perspective leads us to propose a new principle for probing, the architectural bottleneck principle: In order to estimate how much information a given component could extract, a probe should look exactly like the component. Relying on this principle, we estimate how much syntactic information is available to transformers through our attentional probe, a probe that exactly resembles a transformer's self-attention head. Experimentally, we find that, in three models (BERT, ALBERT, and RoBERTa), a sentence's syntax tree is mostly extractable by our probe, suggesting these models have access to syntactic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI)
MethodsMulti-Head Attention · Attention Is All You Need · Adam · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · LAMB · Residual Connection · WordPiece · Dense Connections · Layer Normalization
