A Measure-Theoretic Analysis of Reasoning: Structural Generalization and Approximation Limits
Yuyang Zhang, Yifu Zhang, Xuehai Zhou, Xiaoyin Chen

TL;DR
This paper provides a measure-theoretic framework for understanding reasoning in large language models, highlighting fundamental limits on out-of-distribution generalization related to architecture and depth.
Contribution
It formalizes reasoning via optimal transport, derives bounds on OOD generalization, and establishes depth and architecture constraints for Transformers.
Findings
Shift-invariant mechanisms like Rotary Embeddings improve OOD robustness.
Deeper physical layers are necessary to prevent representation collapse.
Generalization risk increases monotonically with domain shift.
Abstract
While empirical scaling laws for LLM reasoning are well-documented, the theoretical mechanisms governing out-of-distribution (OOD) generalization remain elusive. We formalize reasoning via optimal transport, projecting discrete trajectories into a continuous metric space to quantify domain shifts using the Wasserstein-1 distance. Invoking Kantorovich duality, we bound OOD generalization via architectural Lipschitz continuity and functional approximation limits. This exposes two primary constraints. First, position-dependent attention (e.g., Absolute Positional Encoding) fails to preserve shift invariance, yielding an Lipschitz constant and expected risk, whereas shift-invariant mechanisms (e.g., Rotary Embeddings) preserve equivariance and bound the error. Second, by mapping sequential backtracking to a Dyck- language, we establish a strict circuit depth lower bound for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
