Emergence of Frontier Superposition: M\"obius attractor and Cascade Supervision
Hongyu Gu, Jingwen Fu

TL;DR
This paper investigates how superposition reasoning emerges in Transformers, identifying a M"obius attractor and proposing Cascade Supervision to facilitate gradient flow and reasoning depth.
Contribution
It introduces the concept of a M"obius attractor under symmetry and a novel Cascade Supervision method to enable superposition reasoning in neural networks.
Findings
M"obius attractor reduces layer dynamics to a 1D map.
Cascade Supervision improves gradient persistence and discrimination.
Experimental results match theoretical decay predictions within 0.02.
Abstract
Superposition allows Transformers to reason in depth, carrying an entire reasoning frontier in parallel through a bounded-depth forward pass instead of unrolling serial chain-of-thought tokens. While Zhu et al. (2025) hand-crafted an equal-weight breadth-first frontier in a single residual stream for graph reachability, it remained open whether gradient descent could ever find this target amidst permutation-symmetric saddles. We close this gap on Reachability-by-Superposition over Erd\H{o}s-R\'enyi graphs by isolating architectural and supervisional contributions. Architecturally, we identify a M\"obius attractor: under -symmetry in the tree regime, layerwise dynamics reduce to a 1D M\"obius map whose zero set is a codimension-one manifold of global optima containing the equal-weight superposition state. On the supervision side, we identify Cascade Supervision: a loss class…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
