Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned   Reasoning

Qifan Yu; Zhenyu He; Sijie Li; Xun Zhou; Jun Zhang; Jingjing Xu; Di He

arXiv:2502.08482·cs.CL·February 13, 2025

Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning

Qifan Yu, Zhenyu He, Sijie Li, Xun Zhou, Jun Zhang, Jingjing Xu, Di He

PDF

Open Access 1 Video

TL;DR

This paper introduces RELAY, a method that aligns looped transformer reasoning with chain-of-thought steps, improving long reasoning chains and enhancing auto-regressive model performance on complex tasks.

Contribution

It proposes a novel loop alignment training technique for Looped Transformers, enabling better length generalization and reasoning step prediction for complex problems.

Findings

01

Significant performance improvements in auto-regressive models.

02

Effective length generalization for reasoning chains.

03

Successful alignment of loop iterations with reasoning steps.

Abstract

Chain-of-Thought (CoT) prompting has emerged as a powerful technique for enhancing language model's reasoning capabilities. However, generating long and correct CoT trajectories is challenging. Recent studies have demonstrated that Looped Transformers possess remarkable length generalization capabilities, but their limited generality and adaptability prevent them from serving as an alternative to auto-regressive solutions. To better leverage the strengths of Looped Transformers, we propose RELAY (REasoning through Loop Alignment iterativelY). Specifically, we align the steps of Chain-of-Thought (CoT) reasoning with loop iterations and apply intermediate supervision during the training of Looped Transformers. This additional iteration-wise supervision not only preserves the Looped Transformer's ability for length generalization but also enables it to predict CoT reasoning steps for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning· underline

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Cognitive Science and Mapping · Online Learning and Analytics

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax · Absolute Position Encodings · Dropout · Label Smoothing · ALIGN