Turbo Connection: Reasoning as Information Flow from Higher to Lower Layers
Mohan Tang, Sidi Lu

TL;DR
This paper introduces Turbo Connection, a novel architecture that enhances reasoning in large language models by routing multiple residual connections across layers, significantly improving accuracy on reasoning benchmarks without retraining from scratch.
Contribution
The paper proposes Turbo Connection, a new architecture that allows information flow across layers to overcome fixed-depth limitations in transformers, boosting reasoning performance.
Findings
TurboConn improves accuracy by up to 10% on reasoning benchmarks.
Dense backward connections outperform sparse alternatives.
Enables models to reach perfect accuracy without full retraining.
Abstract
Complex problems, whether in math, logic, or planning, are solved by humans through a sequence of steps where the result of one step informs the next. In this work, we adopt the perspective that the reasoning power of Transformers is fundamentally limited by a fixed maximum number of steps along any latent path of computation. To address this, we introduce Turbo Connection (TurboConn), a novel architecture that overcomes the fixed-depth constraint by routing multiple residual connections from the higher-layer hidden states of each token to the lower layers of token . Fine-tuning pre-trained LLMs with our method not only yields accuracy gains of 0.9% to over 10% on benchmarks like GSM8K, Parity, and multi-step arithmetic, but also demonstrates that the density of these backward connections is critical; our dense interaction significantly outperforms "sparse" alternatives that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Graph Neural Networks · Topic Modeling
