Analyzing Transformer Dynamics as Movement through Embedding Space

Sumeet S. Singh

arXiv:2308.10874·cs.LG·November 15, 2023

Analyzing Transformer Dynamics as Movement through Embedding Space

Sumeet S. Singh

PDF

Open Access

TL;DR

This paper introduces a novel perspective on Transformer models by framing their dynamics as movement through embedding space, revealing how intelligent behaviors emerge from probabilistic paths and vector arrangements.

Contribution

It proposes a theory that models Transformer inference as paths in embedding space, unifies it with other sequence models, and formalizes a concept-space interpretation of embeddings.

Findings

01

Transformers' behaviors correspond to paths in embedding space.

02

Training learns a probability distribution over possible paths.

03

Embedding arrangements influence path probabilities and model behavior.

Abstract

Transformer based language models exhibit intelligent behaviors such as understanding natural language, recognizing patterns, acquiring knowledge, reasoning, planning, reflecting and using tools. This paper explores how their underlying mechanics give rise to intelligent behaviors. Towards that end, we propose framing Transformer dynamics as movement through embedding space. Examining Transformers through this perspective reveals key insights, establishing a Theory of Transformers: 1) Intelligent behaviours map to paths in Embedding Space which, the Transformer random-walks through during inferencing. 2) LM training learns a probability distribution over all possible paths. `Intelligence' is learnt by assigning higher probabilities to paths representing intelligent behaviors. No learning can take place in-context; context only narrows the subset of paths sampled during decoding. 5) The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Label Smoothing · Layer Normalization · Softmax · Dense Connections