ReLaX: Reasoning with Latent Exploration for Large Reasoning Models
Shimin Zhang, Xianwei Chen, Yufan Shen, Ziyuan Ye, Jibin Wu

TL;DR
ReLaX introduces a novel framework that leverages latent dynamics analysis via Koopman operator theory to improve exploration in large reasoning models, leading to enhanced reasoning capabilities across multiple benchmarks.
Contribution
The paper presents ReLaX, a new method that uses latent dynamics and a spectral dispersion metric to better regulate exploration in large reasoning models.
Findings
ReLaX outperforms existing token-level exploration methods.
Latent dynamics analysis improves reasoning performance.
Dynamic Spectral Dispersion correlates with exploration quality.
Abstract
Reinforcement Learning with Verifiable Rewards (RLVR) has recently demonstrated remarkable potential in enhancing the reasoning capability of Large Reasoning Models (LRMs). However, RLVR often drives the policy toward over-determinism, resulting in ineffective exploration and premature policy convergence. While promoting token-level diversity has shown promise in mitigating entropy collapse, we argue that the latent dynamics underlying token generation encode a far richer computational structure for steering policy optimization toward a more effective exploration-exploitation tradeoff. To enable tractable analysis and intervention of the latent dynamics of LRMs, we leverage Koopman operator theory to obtain a linearized representation of their hidden state dynamics. This enables us to introduce Dynamic Spectral Dispersion (DSD), a new metric to quantify the heterogeneity of the model's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)
