From Self-Attention to Markov Models: Unveiling the Dynamics of   Generative Transformers

M. Emrullah Ildiz; Yixiao Huang; Yingcong Li; Ankit Singh Rawat and; Samet Oymak

arXiv:2402.13512·cs.LG·February 22, 2024·1 cites

From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers

M. Emrullah Ildiz, Yixiao Huang, Yingcong Li, Ankit Singh Rawat and, Samet Oymak

PDF

Open Access

TL;DR

This paper establishes a formal connection between self-attention mechanisms in transformers and Markov models, providing theoretical insights into their behavior, sample complexity, and tendencies for repetitive text generation.

Contribution

It introduces a novel formalism linking self-attention to Markov chains, with conditions for learning and explanations for repetitive outputs in language models.

Findings

01

Self-attention models can be viewed as context-conditioned Markov chains.

02

Positional encoding influences transition probabilities in the Markov model.

03

Repetitive text generation is explained by a winner-takes-all phenomenon in non-mixing processes.

Abstract

Modern language models rely on the transformer architecture and attention mechanism to perform language understanding and text generation. In this work, we study learning a 1-layer self-attention model from a set of prompts and associated output data sampled from the model. We first establish a precise mapping between the self-attention mechanism and Markov models: Inputting a prompt to the model samples the output token according to a context-conditioned Markov chain (CCMC) which weights the transition matrix of a base Markov chain. Additionally, incorporating positional encoding results in position-dependent scaling of the transition probabilities. Building on this formalism, we develop identifiability/coverage conditions for the prompt distribution that guarantee consistent estimation and establish sample complexity guarantees under IID samples. Finally, we study the problem of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsSparse Evolutionary Training · Balanced Selection