Rectifying LLM Thought from Lens of Optimization

Junnan Liu; Hongwei Liu; Songyang Zhang; Kai Chen

arXiv:2512.01925·cs.CL·April 9, 2026

Rectifying LLM Thought from Lens of Optimization

Junnan Liu, Hongwei Liu, Songyang Zhang, Kai Chen

PDF

1 Repo 1 Video

TL;DR

This paper introduces RePro, a novel method that refines LLM reasoning by viewing chain-of-thought as an optimization process, improving reasoning quality and reducing overthinking through reinforcement learning techniques.

Contribution

It proposes a new optimization-based perspective on LLM reasoning and introduces RePro, a process-level reward mechanism integrated into RLVR to enhance reasoning performance.

Findings

01

RePro improves reasoning accuracy across multiple benchmarks.

02

RePro reduces overthinking and excessively long reasoning chains.

03

RePro consistently outperforms baseline methods in diverse tasks.

Abstract

Recent advancements in large language models (LLMs) have been driven by their emergent reasoning capabilities, particularly through long chain-of-thought (CoT) prompting, which enables thorough exploration and deliberation. Despite these advances, long-CoT LLMs often exhibit suboptimal reasoning behaviors, such as overthinking and excessively protracted reasoning chains, which can impair performance. In this paper, we analyze reasoning processes through an optimization lens, framing CoT as a gradient descent procedure where each reasoning step constitutes an update toward problem resolution. Building on this perspective, we introduce RePro (Rectifying Process-level Reward), a novel approach to refine LLM reasoning during post-training. RePro defines a surrogate objective function to assess the optimization process underlying CoT, utilizing a dual scoring mechanism to quantify its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

open-compass/RePro
github

Videos

Rectifying LLM Thought from Lens of Optimization· slideslive