Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs

Rujiao Long; Yang Li; Xingyao Zhang; Weixun Wang; Tianqianjin Lin; Xi Zhao; Yuchi Xu; Wenbo Su; Junchi Yan; Bo Zheng

arXiv:2512.17206·cs.CV·December 22, 2025

Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs

Rujiao Long, Yang Li, Xingyao Zhang, Weixun Wang, Tianqianjin Lin, Xi Zhao, Yuchi Xu, Wenbo Su, Junchi Yan, Bo Zheng

PDF

Open Access

TL;DR

Reasoning Palette introduces a latent-modulation framework for large language models that enhances reasoning diversity and controllability by inferring and decoding latent contexts, leading to improved exploration and performance.

Contribution

It proposes a novel latent variable approach using a VAE to modulate reasoning strategies in (V)LMs, enabling interpretable control and better exploration during inference and RL training.

Findings

01

Improves reasoning diversity and controllability.

02

Enhances exploration efficiency in RL training.

03

Achieves consistent performance gains on reasoning benchmarks.

Abstract

Exploration capacity shapes both inference-time performance and reinforcement learning (RL) training for large (vision-) language models, as stochastic sampling often yields redundant reasoning paths with little high-level diversity. This paper proposes Reasoning Palette, a novel latent-modulation framework that endows the model with a stochastic latent variable for strategic contextualization, guiding its internal planning prior to token generation. This latent context is inferred from the mean-pooled embedding of a question-answer pair via a variational autoencoder (VAE), where each sampled latent potentially encodes a distinct reasoning context. During inference, a sampled latent is decoded into learnable token prefixes and prepended to the input prompt, modulating the model's internal reasoning trajectory. In this way, the model performs internal sampling over reasoning strategies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Reinforcement Learning in Robotics