Hybrid Latent Reasoning via Reinforcement Learning

Zhenrui Yue; Bowen Jin; Huimin Zeng; Honglei Zhuang; Zhen Qin; Jinsung Yoon; Lanyu Shang; Jiawei Han; Dong Wang

arXiv:2505.18454·cs.CL·October 24, 2025

Hybrid Latent Reasoning via Reinforcement Learning

Zhenrui Yue, Bowen Jin, Huimin Zeng, Honglei Zhuang, Zhen Qin, Jinsung Yoon, Lanyu Shang, Jiawei Han, Dong Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper proposes a reinforcement learning-based hybrid latent reasoning method for large language models, enabling continuous and discrete reasoning integration without relying on chain-of-thought traces, leading to improved performance and interpretability.

Contribution

It introduces HRPO, a novel RL-based approach that combines hidden states and token embeddings for latent reasoning in LLMs, overcoming previous incompatibilities.

Findings

01

HRPO outperforms prior methods on diverse benchmarks.

02

Models trained with HRPO exhibit interpretability and cross-lingual patterns.

03

HRPO maintains generative capabilities while enhancing reasoning performance.

Abstract

Recent advances in large language models (LLMs) have introduced latent reasoning as a promising alternative to autoregressive reasoning. By performing internal computation with hidden states from previous steps, latent reasoning benefit from more informative features rather than sampling a discrete chain-of-thought (CoT) path. Yet latent reasoning approaches are often incompatible with LLMs, as their continuous paradigm conflicts with the discrete nature of autoregressive generation. Moreover, these methods rely on CoT traces for training and thus fail to exploit the inherent reasoning patterns of LLMs. In this work, we explore latent reasoning by leveraging the intrinsic capabilities of LLMs via reinforcement learning (RL). To this end, we introduce hybrid reasoning policy optimization (HRPO), an RL-based hybrid latent reasoning approach that (1) integrates prior hidden states into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yueeeeeeee/hrpo
jaxOfficial

Videos

Hybrid Latent Reasoning via Reinforcement Learning· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Reinforcement Learning in Robotics