Unlocking Recursive Thinking of LLMs: Alignment via Refinement

Haoke Zhang; Xiaobo Liang; Cunxiang Wang; Juntao Li; Min Zhang

arXiv:2506.06009·cs.CL·June 9, 2025

Unlocking Recursive Thinking of LLMs: Alignment via Refinement

Haoke Zhang, Xiaobo Liang, Cunxiang Wang, Juntao Li, Min Zhang

PDF

Open Access

TL;DR

This paper introduces AvR, a refinement-based training method that enhances recursive reasoning in LLMs using long-form Chain of Thought, leading to significant performance improvements with synthetic data.

Contribution

The paper proposes AvR, a novel refinement approach that improves LLM recursive thinking through criticism and iterative improvement, without requiring expert-curated data.

Findings

01

AvR outperforms preference optimization methods.

02

With only 3k synthetic samples, AvR boosts LLaMA-3-8B-Instruct performance by over 20%.

03

The method enables effective scaling at test time.

Abstract

The OpenAI o1-series models have demonstrated that leveraging long-form Chain of Thought (CoT) can substantially enhance performance. However, the recursive thinking capabilities of Large Language Models (LLMs) remain limited, particularly in the absence of expert-curated data for distillation. In this paper, we propose \textbf{AvR}: \textbf{Alignment via Refinement}, a novel method aimed at unlocking the potential of LLMs for recursive reasoning through long-form CoT. AvR introduces a refinement process that integrates criticism and improvement actions, guided by differentiable learning techniques to optimize \textbf{refinement-aware rewards}. As a result, the synthesized multi-round data can be organized as a long refinement thought, further enabling test-time scaling. Experimental results show that AvR significantly outperforms conventional preference optimization methods. Notably,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Computational and Text Analysis Methods