APR: Penalizing Structural Redundancy in Large Reasoning Models via Anchor-based Process Rewards

Kaiyan Chang; Chenwei Zhu; Yingfeng Luo; Yifu Huo; Chenglong Wang; Xiaoqian Liu; Qiaozhi He; Tong Xiao; Zhengtao Yu; Jingbo Zhu

arXiv:2602.00760·cs.CL·February 10, 2026

APR: Penalizing Structural Redundancy in Large Reasoning Models via Anchor-based Process Rewards

Kaiyan Chang, Chenwei Zhu, Yingfeng Luo, Yifu Huo, Chenglong Wang, Xiaoqian Liu, Qiaozhi He, Tong Xiao, Zhengtao Yu, Jingbo Zhu

PDF

Open Access

TL;DR

This paper introduces APR, a reward method that reduces structural redundancy in large reasoning models by penalizing repetitive verification after the reasoning process stabilizes, improving efficiency and performance.

Contribution

It formalizes the concept of the Reasoning Anchor and proposes a novel reward shaping technique to mitigate overthinking in large reasoning models.

Findings

01

APR achieves a better performance-efficiency trade-off on multiple datasets.

02

Models trained with APR require fewer computational resources.

03

APR maintains or improves reasoning accuracy while reducing unnecessary verification.

Abstract

Test-Time Scaling (TTS) has significantly enhanced the capabilities of Large Reasoning Models (LRMs) but introduces a critical side-effect known as Overthinking. We conduct a preliminary study to rethink this phenomenon from a fine-grained perspective. We observe that LRMs frequently conduct repetitive self-verification without revision even after obtaining the final answer during the reasoning process. We formally define this specific position where the answer first stabilizes as the Reasoning Anchor. By analyzing pre- and post-anchor reasoning behaviors, we uncover the structural redundancy fixed in LRMs: the meaningless repetitive verification after deriving the first complete answer, which we term the Answer-Stable Tail (AST). Motivated by this observation, we propose Anchor-based Process Reward (APR), a structure-aware reward shaping method that localizes the reasoning anchor and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Machine Learning in Healthcare