Restoring Exploration after Post-Training: Latent Exploration Decoding for Large Reasoning Models

Wenhui Tan; Fiorenzo Parascandolo; Enver Sangineto; Jianzhong Ju; Zhenbo Luo; Qian Cao; Rita Cucchiara; Ruihua Song; Jian Luan

arXiv:2602.01698·cs.CL·May 12, 2026

Restoring Exploration after Post-Training: Latent Exploration Decoding for Large Reasoning Models

Wenhui Tan, Fiorenzo Parascandolo, Enver Sangineto, Jianzhong Ju, Zhenbo Luo, Qian Cao, Rita Cucchiara, Ruihua Song, Jian Luan

PDF

1 Repo 1 Models

TL;DR

This paper introduces Latent Exploration Decoding (LED), a method that restores effective exploration in large reasoning models post-training, leading to improved accuracy and reinforcement learning performance.

Contribution

LED is a novel, parameter-free decoding strategy that leverages intermediate layer entropy to enhance exploration in large reasoning models after training.

Findings

01

LED improves pass@1 accuracy by 0.61 percentage points.

02

LED enhances pass@16 accuracy by 1.03 percentage points.

03

Integrating LED into reinforcement learning accelerates reward improvement.

Abstract

Large Reasoning Models (LRMs) have recently achieved strong mathematical and code reasoning performance through Reinforcement Learning (RL) post-training. However, we show that modern reasoning post-training induces an unintended exploration collapse: temperature-based sampling no longer increases pass@ $n$ accuracy. Empirically, the final-layer posterior of post-trained LRMs exhibit sharply reduced entropy, while the entropy of intermediate layers remains relatively high. Motivated by this entropy asymmetry, we propose Latent Exploration Decoding (LED), a depth-conditioned decoding strategy. LED aggregates intermediate posteriors via cumulative sum and selects depth configurations with maximal entropy as exploration candidates. Without additional training or parameters, LED consistently improves pass@1 and pass@16 accuracy by 0.61 and 1.03 percentage points across multiple reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AlbertTan404/LED
github

Models

🤗
specimba/nexus-os-v2
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.