Uncertainty Estimation of Transformers' Predictions via Topological Analysis of the Attention Matrices
Elizaveta Kostenok, Daniil Cherniavskii, Alexey Zaytsev

TL;DR
This paper introduces a novel uncertainty estimation method for transformer models that leverages topological analysis of attention matrices, outperforming existing techniques in NLP tasks with improved interpretability.
Contribution
The paper proposes a new approach using topological features of attention maps to estimate uncertainty, addressing limitations of prior methods and enhancing interpretability.
Findings
Outperforms existing uncertainty estimation techniques on NLP benchmarks.
Provides a low-dimensional, interpretable representation of model confidence.
Offers a more efficient alternative to ensemble-based methods.
Abstract
Transformer-based language models have set new benchmarks across a wide range of NLP tasks, yet reliably estimating the uncertainty of their predictions remains a significant challenge. Existing uncertainty estimation (UE) techniques often fall short in classification tasks, either offering minimal improvements over basic heuristics or relying on costly ensemble models. Moreover, attempts to leverage common embeddings for UE in linear probing scenarios have yielded only modest gains, indicating that alternative model components should be explored. We tackle these limitations by harnessing the geometry of attention maps across multiple heads and layers to assess model confidence. Our approach extracts topological features from attention matrices, providing a low-dimensional, interpretable representation of the model's internal dynamics. Additionally, we introduce topological features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Layer Normalization · Dense Connections · Absolute Position Encodings · Residual Connection
