Acceptability Judgements via Examining the Topology of Attention Maps
Daniil Cherniavskii, Eduard Tulchinskii, Vladislav Mikhailov, Irina, Proskurina, Laida Kushnareva, Ekaterina Artemova, Serguei Barannikov, Irina, Piontkovskaya, Dmitri Piontkovski, Evgeny Burnaev

TL;DR
This paper introduces a topological data analysis approach to evaluate attention maps in NLP models, significantly improving acceptability judgments and enabling linguistic interpretability.
Contribution
It demonstrates that topological features of attention graphs enhance acceptability classification and reveal linguistic functions of attention heads.
Findings
Topological features improve BERT acceptability scores by 8%-24%.
Achieved human-level performance on BLiMP benchmark.
Revealed topological differences between minimal pairs' attention maps.
Abstract
The role of the attention mechanism in encoding linguistic knowledge has received special interest in NLP. However, the ability of the attention heads to judge the grammatical acceptability of a sentence has been underexplored. This paper approaches the paradigm of acceptability judgments with topological data analysis (TDA), showing that the geometric properties of the attention graph can be efficiently exploited for two standard practices in linguistics: binary judgments and linguistic minimal pairs. Topological features enhance the BERT-based acceptability classifier scores by %-% on CoLA in three languages (English, Italian, and Swedish). By revealing the topological discrepancy between attention maps of minimal pairs, we achieve the human-level performance on the BLiMP benchmark, outperforming nine statistical and Transformer LM baselines. At the same time, TDA provides the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsS100 Proteins and Annexins · Topological and Geometric Data Analysis
MethodsAttention Is All You Need · Linear Layer · Softmax · Dense Connections · Position-Wise Feed-Forward Layer · Adam · Absolute Position Encodings · Byte Pair Encoding · Residual Connection · Label Smoothing
