Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning
Qifan Yu, Zhenyu He, Sijie Li, Xun Zhou, Jun Zhang, Jingjing Xu, Di He

TL;DR
This paper introduces RELAY, a method that aligns looped transformer reasoning with chain-of-thought steps, improving long reasoning chains and enhancing auto-regressive model performance on complex tasks.
Contribution
It proposes a novel loop alignment training technique for Looped Transformers, enabling better length generalization and reasoning step prediction for complex problems.
Findings
Significant performance improvements in auto-regressive models.
Effective length generalization for reasoning chains.
Successful alignment of loop iterations with reasoning steps.
Abstract
Chain-of-Thought (CoT) prompting has emerged as a powerful technique for enhancing language model's reasoning capabilities. However, generating long and correct CoT trajectories is challenging. Recent studies have demonstrated that Looped Transformers possess remarkable length generalization capabilities, but their limited generality and adaptability prevent them from serving as an alternative to auto-regressive solutions. To better leverage the strengths of Looped Transformers, we propose RELAY (REasoning through Loop Alignment iterativelY). Specifically, we align the steps of Chain-of-Thought (CoT) reasoning with loop iterations and apply intermediate supervision during the training of Looped Transformers. This additional iteration-wise supervision not only preserves the Looped Transformer's ability for length generalization but also enables it to predict CoT reasoning steps for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Cognitive Science and Mapping · Online Learning and Analytics
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax · Absolute Position Encodings · Dropout · Label Smoothing · ALIGN
