Acceptability Judgements via Examining the Topology of Attention Maps

Daniil Cherniavskii; Eduard Tulchinskii; Vladislav Mikhailov; Irina; Proskurina; Laida Kushnareva; Ekaterina Artemova; Serguei Barannikov; Irina; Piontkovskaya; Dmitri Piontkovski; Evgeny Burnaev

arXiv:2205.09630·cs.CL·August 15, 2023·1 cites

Acceptability Judgements via Examining the Topology of Attention Maps

Daniil Cherniavskii, Eduard Tulchinskii, Vladislav Mikhailov, Irina, Proskurina, Laida Kushnareva, Ekaterina Artemova, Serguei Barannikov, Irina, Piontkovskaya, Dmitri Piontkovski, Evgeny Burnaev

PDF

Open Access 1 Repo

TL;DR

This paper introduces a topological data analysis approach to evaluate attention maps in NLP models, significantly improving acceptability judgments and enabling linguistic interpretability.

Contribution

It demonstrates that topological features of attention graphs enhance acceptability classification and reveal linguistic functions of attention heads.

Findings

01

Topological features improve BERT acceptability scores by 8%-24%.

02

Achieved human-level performance on BLiMP benchmark.

03

Revealed topological differences between minimal pairs' attention maps.

Abstract

The role of the attention mechanism in encoding linguistic knowledge has received special interest in NLP. However, the ability of the attention heads to judge the grammatical acceptability of a sentence has been underexplored. This paper approaches the paradigm of acceptability judgments with topological data analysis (TDA), showing that the geometric properties of the attention graph can be efficiently exploited for two standard practices in linguistics: binary judgments and linguistic minimal pairs. Topological features enhance the BERT-based acceptability classifier scores by $8$ %- $24$ % on CoLA in three languages (English, Italian, and Swedish). By revealing the topological discrepancy between attention maps of minimal pairs, we achieve the human-level performance on the BLiMP benchmark, outperforming nine statistical and Transformer LM baselines. At the same time, TDA provides the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

danchern97/tda4la
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsS100 Proteins and Annexins · Topological and Geometric Data Analysis

MethodsAttention Is All You Need · Linear Layer · Softmax · Dense Connections · Position-Wise Feed-Forward Layer · Adam · Absolute Position Encodings · Byte Pair Encoding · Residual Connection · Label Smoothing