Hidden Holes: topological aspects of language models
Stephen Fitz, Peter Romero, Jiyan Jonas Schneider

TL;DR
This paper investigates the topological structure of language model representations using algebraic topology, revealing differences between transformer and recurrent models and introducing a new measure called perforation.
Contribution
It introduces novel topological tools and a measure called perforation to analyze language model representations, highlighting differences between model architectures.
Findings
Transformers exhibit less topological complexity than recurrent models.
Topological patterns are consistent across natural languages but not synthetic data.
The study provides new mathematical insights into neural language model representations.
Abstract
We explore the topology of representation manifolds arising in autoregressive neural language models trained on raw text data. In order to study their properties, we introduce tools from computational algebraic topology, which we use as a basis for a measure of topological complexity, that we call perforation. Using this measure, we study the evolution of topological structure in GPT based large language models across depth and time during training. We then compare these to gated recurrent models, and show that the latter exhibit more topological complexity, with a distinct pattern of changes common to all natural languages but absent from synthetically generated data. The paper presents a detailed analysis of the representation manifolds derived by these models based on studying the shapes of vector clouds induced by them as they are conditioned on sentences from corpora of natural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Sigmoid Activation · Byte Pair Encoding · Adam · Tanh Activation · Attention Dropout · Linear Layer · Multi-Head Attention · Dropout
