The geometry of BERT
Matteo Bonino, Giorgia Ghione, Giansalvo Cirrincione

TL;DR
This paper offers a theoretical analysis of BERT's internal attention mechanisms, exploring local and global behaviors, and introduces new concepts like cone index to enhance interpretability and understanding of its classification process.
Contribution
It presents a novel theoretical perspective on BERT's attention mechanism, including the concept of cone index and analysis of semantic content, advancing explainability of Transformer models.
Findings
Analysis of attention patterns and subspace directionality
Introduction of cone index for global information assessment
High accuracy case study on SARS-CoV-2 variant classification
Abstract
Transformer neural networks, particularly Bidirectional Encoder Representations from Transformers (BERT), have shown remarkable performance across various tasks such as classification, text summarization, and question answering. However, their internal mechanisms remain mathematically obscure, highlighting the need for greater explainability and interpretability. In this direction, this paper investigates the internal mechanisms of BERT proposing a novel perspective on the attention mechanism of BERT from a theoretical perspective. The analysis encompasses both local and global network behavior. At the local level, the concept of directionality of subspace selection as well as a comprehensive study of the patterns emerging from the self-attention matrix are presented. Additionally, this work explores the semantic content of the information stream through data distribution analysis and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Search Problems · Computational Geometry and Mesh Generation
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Weight Decay · Byte Pair Encoding · WordPiece · Layer Normalization · Residual Connection · Linear Layer · Linear Warmup With Linear Decay · Dense Connections
