Propagation of Chaos in Contextual Flow Maps

Shi Chen; Zhengjiang Lin; Kaizhao Liu; Philippe Rigollet

arXiv:2605.16747·cs.LG·May 19, 2026

Propagation of Chaos in Contextual Flow Maps

Shi Chen, Zhengjiang Lin, Kaizhao Liu, Philippe Rigollet

PDF

TL;DR

This paper develops a statistical theory for transformers in large-context regimes using the framework of contextual flow maps, analyzing the approximation of finite versus infinite context models.

Contribution

It introduces a new Eulerian adjoint formulation and establishes optimal bounds on the deviation between finite and infinite context models, including transformers.

Findings

01

Achieves optimal Wasserstein rate $n^{-1/d}$ for general CFMs.

02

Establishes parametric rate $n^{-1/2}$ for a class including transformers.

03

Provides stability estimates for forward--adjoint systems.

Abstract

We develop a quantitative statistical theory of transformers in the large-context regime by adopting the abstraction of contextual flow maps (CFMs): dynamical systems that evolve a distinguished token in the presence of a contextual measure across a stack of attention blocks. Within this framework, the finite-context model approximates an idealized infinite-context system in which the contextual measure is replaced by its underlying population, so that the context length $n$ becomes a statistical resource. Exploiting the McKean--Vlasov structure of the dynamics and the classical machinery of propagation of chaos, we establish a forward bound controlling the deviation between the finite- and infinite-context CFMs uniformly along depth, and a backward bound controlling the deviation between the corresponding training trajectories uniformly across iterations of online gradient descent.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.