Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning

Jingcheng Deng; Zihao Wei; Liang Pang; Junhong Wu; Shicheng Xu; Zenghao Duan; Huawei Shen

arXiv:2604.27998·cs.LG·May 1, 2026

Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning

Jingcheng Deng, Zihao Wei, Liang Pang, Junhong Wu, Shicheng Xu, Zenghao Duan, Huawei Shen

PDF

1 Repo

TL;DR

Latent-GRPO introduces a novel reinforcement learning method for latent reasoning that overcomes stability issues, leading to more efficient and accurate reasoning with shorter chains across various benchmarks.

Contribution

It proposes Latent-GRPO, a new algorithm that addresses key challenges in latent RL reasoning, improving stability and performance over existing methods.

Findings

01

Latent-GRPO improves Pass@1 by 7.86 points on low-difficulty tasks.

02

It surpasses explicit GRPO by 4.27 points on high-difficulty tasks.

03

Achieves stronger pass@$k$ performance with shorter reasoning chains.

Abstract

Latent reasoning offers a more efficient alternative to explicit reasoning by compressing intermediate reasoning into continuous representations and substantially shortening reasoning chains. However, existing latent reasoning methods mainly focus on supervised learning, and reinforcement learning in latent space remains highly unstable. We study this problem through the lens of Group Relative Policy Optimization (GRPO), and show that directly adapting GRPO to latent reasoning is fundamentally non-trivial: latent reasoning changes both the probability density and the sampling mechanism, causing three coupled bottlenecks: absence of intrinsic latent manifolds, where unconstrained exploration pushes rollouts off the valid latent manifold; exploration-optimization misalignment, where trajectory-level rewards can induce incorrect token-level updates; and latent mixture non-closure, where…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

djc-go-solo/Latent-GRPO
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.