Hybrid Latent Reasoning with Decoupled Policy Optimization

Tao Cheng; Shi-Zhe Chen; Hao Zhang; Yixin Qin; Jinwen Luo; Zheng Wei

arXiv:2604.20328·cs.CV·April 23, 2026

Hybrid Latent Reasoning with Decoupled Policy Optimization

Tao Cheng, Shi-Zhe Chen, Hao Zhang, Yixin Qin, Jinwen Luo, Zheng Wei

PDF

1 Repo 1 Models

TL;DR

HyLaR introduces a hybrid reasoning framework that combines discrete text generation with continuous visual latent representations, utilizing a novel optimization method to improve multimodal understanding.

Contribution

The paper presents HyLaR, a new hybrid latent reasoning framework with Decoupled Policy Optimization for better multimodal reasoning, surpassing existing methods.

Findings

01

HyLaR outperforms standard MLLMs on perception and understanding benchmarks.

02

DePO effectively optimizes hybrid discrete-continuous action spaces.

03

Extensive experiments validate HyLaR's superior reasoning capabilities.

Abstract

Chain-of-Thought (CoT) reasoning significantly elevates the complex problem-solving capabilities of multimodal large language models (MLLMs). However, adapting CoT to vision typically discretizes signals to fit LLM inputs, causing early semantic collapse and discarding fine-grained details. While external tools can mitigate this, they introduce a rigid bottleneck, confining reasoning to predefined operations. Although recent latent reasoning paradigms internalize visual states to overcome these limitations, optimizing the resulting hybrid discrete-continuous action space remains challenging. In this work, we propose HyLaR (Hybrid Latent Reasoning), a framework that seamlessly interleaves discrete text generation with continuous visual latent representations. Specifically, following an initial cold-start supervised fine-tuning (SFT), we introduce DePO (Decoupled Policy Optimization) to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

EthenCheng/HyLaR
github

Models

🤗
TencentBAC/HyLaR-Qwen2.5-VL-7B
model· 105 dl· ♡ 3
105 dl♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.