Blink of an eye: a simple theory for feature localization in generative models
Marvin Li, Aayush Karan, Sitan Chen

TL;DR
This paper introduces a simple, unifying theory explaining sudden shifts in behavior of generative models during output, applicable to both language and diffusion models, and validated through empirical analysis.
Contribution
The work develops a theory that explains feature localization in generative models without relying on strong distributional assumptions, applicable to multiple model types.
Findings
Critical windows often coincide with failures in problem solving.
The theory applies to autoregressive and diffusion models.
Empirical validation confirms the theory's predictions.
Abstract
Large language models can exhibit unexpected behavior in the blink of an eye. In a recent computer use demo, a language model switched from coding to Googling pictures of Yellowstone, and these sudden shifts in behavior have also been observed in reasoning patterns and jailbreaks. This phenomenon is not unique to autoregressive models: in diffusion models, key features of the final output are decided in narrow ``critical windows'' of the generation process. In this work we develop a simple, unifying theory to explain this phenomenon using the formalism of stochastic localization samplers. We show that it emerges generically as the generation process localizes to a sub-population of the distribution it models. While critical windows have been studied at length in diffusion models, existing theory heavily relies on strong distributional assumptions and the particulars of Gaussian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Language and cultural evolution · Cellular Automata and Applications
MethodsDiffusion
