State Stream Transformer (SST) V2: Parallel Training of Nonlinear Recurrence for Latent Space Reasoning
Thea Aviss

TL;DR
The paper introduces SST V2, a transformer variant that enables efficient nonlinear recurrence in latent space for improved reasoning, with a novel parallel training method and demonstrated superior performance on reasoning benchmarks.
Contribution
SST V2's nonlinear recurrence mechanism and parallel training approach enhance reasoning capacity in transformers without increasing model size.
Findings
SST V2 achieves +15.15 points on GPQA-Diamond over baseline.
SST reduces GSM8K errors by 46% compared to baseline.
SST outperforms larger models on reasoning benchmarks.
Abstract
Current transformers discard their rich latent residual stream between positions, reconstructing latent reasoning context at each new position and leaving potential reasoning capacity untapped. The State Stream Transformer (SST) V2 enables parameter-efficient reasoning in continuous latent space through an FFN-driven nonlinear recurrence at each decoder layer, where latent states are streamed horizontally across the full sequence via a learned blend. This same mechanism supports continuous latent deliberation per position at inference time, dedicating additional FLOPs to exploring abstract reasoning before committing to a token. A two-pass parallel training procedure resolves the sequential dependency of the recurrence to allow compute-efficient training. Hidden state analysis shows the state stream facilitates reasoning through exploration of distinct semantic basins in continuous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
