Beyond Imitation: Reinforcement Learning for Active Latent Planning

Zhi Zheng; Wee Sun Lee

arXiv:2601.21598·cs.AI·January 30, 2026

Beyond Imitation: Reinforcement Learning for Active Latent Planning

Zhi Zheng, Wee Sun Lee

PDF

Open Access

TL;DR

This paper introduces ATP-Latent, a novel active latent planning method that uses a VAE and reinforcement learning to improve reasoning accuracy and efficiency in large language models by optimizing latent token representations.

Contribution

It proposes an active latent planning framework with a VAE-based supervision and RL-guided policy to enhance reasoning in LLMs, addressing limitations of passive imitation.

Findings

01

+4.1% accuracy on benchmarks

02

-3.3% tokens used compared to baselines

03

Improved latent reasoning policy effectiveness

Abstract

Aiming at efficient and dense chain-of-thought (CoT) reasoning, latent reasoning methods fine-tune Large Language Models (LLMs) to substitute discrete language tokens with continuous latent tokens. These methods consume fewer tokens compared to the conventional language CoT reasoning and have the potential to plan in a dense latent space. However, current latent tokens are generally supervised based on imitating language labels. Considering that there can be multiple equivalent but diverse CoT labels for a question, passively imitating an arbitrary one may lead to inferior latent token representations and latent reasoning policies, undermining the potential planning ability and resulting in clear gaps between training and testing. In this work, we emphasize the importance of active planning over the representation space of latent tokens in achieving the optimal latent reasoning policy.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Machine Learning in Healthcare