Latent Poincar\'e Shaping for Agentic Reinforcement Learning

Hanchen Xia; Baoyou Chen; Zelin Zang; Yutang Ge; Guojiang Zhao; Siyu Zhu

arXiv:2602.09375·cs.LG·March 12, 2026

Latent Poincar\'e Shaping for Agentic Reinforcement Learning

Hanchen Xia, Baoyou Chen, Zelin Zang, Yutang Ge, Guojiang Zhao, Siyu Zhu

PDF

Open Access

TL;DR

LaPha introduces a Poincaré latent space approach for training agentic LLMs, enhancing search efficiency and accuracy in mathematical problem-solving tasks with minimal overhead.

Contribution

The paper presents LaPha, a novel hyperbolic space-based training method for LLM agents that improves search capacity and accuracy in mathematical reasoning.

Findings

01

Significant accuracy improvements on math benchmarks.

02

Effective visualization of search as a tree in hyperbolic space.

03

Enhanced self-guided test-time scaling with a lightweight value head.

Abstract

We propose LaPha, a method for training AlphaZero-like LLM agents in a Poincar\'e latent space. Under LaPha, the search process can be visualized as a tree rooted at the prompt and growing outward from the origin toward the boundary of the Poincar\'e ball, where negative curvature provides exponentially increasing capacity with radius. Using hyperbolic geodesic distance to rule-verified correctness, we define a node potential and assign dense process rewards by potential differences. We further attach a lightweight value head on the same shared latent space, enabling self-guided test-time scaling with almost no additional overhead. On MATH-500, LaPha improves Qwen2.5-Math-1.5B from 66.0% to 88.2%. With value-head-guided search, LaPha-1.5B reaches 56.7% accuracy on AIME'24, and LaPha-7B further achieves 60.0% on AIME'24 and 53.3% on AIME'25.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Generative Adversarial Networks and Image Synthesis