Propagation of Chaos in Contextual Flow Maps
Shi Chen, Zhengjiang Lin, Kaizhao Liu, Philippe Rigollet

TL;DR
This paper develops a statistical theory for transformers in large-context regimes using the framework of contextual flow maps, analyzing the approximation of finite versus infinite context models.
Contribution
It introduces a new Eulerian adjoint formulation and establishes optimal bounds on the deviation between finite and infinite context models, including transformers.
Findings
Achieves optimal Wasserstein rate $n^{-1/d}$ for general CFMs.
Establishes parametric rate $n^{-1/2}$ for a class including transformers.
Provides stability estimates for forward--adjoint systems.
Abstract
We develop a quantitative statistical theory of transformers in the large-context regime by adopting the abstraction of contextual flow maps (CFMs): dynamical systems that evolve a distinguished token in the presence of a contextual measure across a stack of attention blocks. Within this framework, the finite-context model approximates an idealized infinite-context system in which the contextual measure is replaced by its underlying population, so that the context length becomes a statistical resource. Exploiting the McKean--Vlasov structure of the dynamics and the classical machinery of propagation of chaos, we establish a forward bound controlling the deviation between the finite- and infinite-context CFMs uniformly along depth, and a backward bound controlling the deviation between the corresponding training trajectories uniformly across iterations of online gradient descent.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
