Beyond Alignment: Expanding Reasoning Capacity via Manifold-Reshaping Policy Optimization

Dayu Wang; Jiaye Yang; Weikang Li; Jiahui Liang; and Yang Li

arXiv:2602.02545·cs.LG·February 4, 2026

Beyond Alignment: Expanding Reasoning Capacity via Manifold-Reshaping Policy Optimization

Dayu Wang, Jiaye Yang, Weikang Li, Jiahui Liang, and Yang Li

PDF

Open Access

TL;DR

This paper introduces Manifold-Reshaping Policy Optimization (MRPO), a geometric approach that expands the reasoning capabilities of large language models by restructuring their inference space, leading to state-of-the-art performance on mathematical tasks.

Contribution

The paper presents MRPO, a novel geometric framework that fundamentally reshapes the inference space of LLMs, enabling enhanced reasoning beyond traditional alignment methods.

Findings

01

MRPO outperforms larger models on mathematical reasoning tasks.

02

It significantly expands the reasoning capacity boundary of LLMs.

03

The approach achieves state-of-the-art results with a 4B-parameter model.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated remarkable success in enhancing the reasoning capabilities of Large Language Models (LLMs). However, recent studies question whether RL genuinely expands reasoning capacity or merely aligns existing latent capabilities, arguing that exploration remains confined within the pre-trained model's low-rank bias manifold. In this work, we challenge this accessibility boundary hypothesis by demonstrating that the latent reasoning space can be fundamentally expanded through targeted geometric interventions. We propose Manifold-Reshaping Policy Optimization (MRPO), a geometric framework designed to fundamentally restructure the inference space of LLMs. MRPO operates in two stages: first, we employ Spectral Orthogonal Exploration (SOE) to eject the policy initialization into the null space of the bias manifold; second, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)