Maximum Entropy Exploration Without the Rollouts

Jacob Adamczyk; Adam Kamoski; Rahul V. Kulkarni

arXiv:2603.12325·cs.LG·March 16, 2026

Maximum Entropy Exploration Without the Rollouts

Jacob Adamczyk, Adam Kamoski, Rahul V. Kulkarni

PDF

Open Access

TL;DR

This paper introduces EVE, a novel eigenvector-based algorithm for maximum entropy exploration in reinforcement learning that avoids costly rollouts by leveraging spectral methods and iterative updates.

Contribution

It proposes a spectral characterization of maximum entropy exploration, leading to an efficient eigenvector-based algorithm that does not require explicit state visitation estimation.

Findings

01

EVE efficiently computes high-entropy policies without rollouts.

02

EVE achieves competitive exploration performance in grid-world environments.

03

The approach converges monotonically and is theoretically justified.

Abstract

Efficient exploration remains a central challenge in reinforcement learning, serving as a useful pretraining objective for data collection, particularly when an external reward function is unavailable. A principled formulation of the exploration problem is to find policies that maximize the entropy of their induced steady-state visitation distribution, thereby encouraging uniform long-run coverage of the state space. Many existing exploration approaches require estimating state visitation frequencies through repeated on-policy rollouts, which can be computationally expensive. In this work, we instead consider an intrinsic average-reward formulation in which the reward is derived from the visitation distribution itself, so that the optimal policy maximizes steady-state entropy. An entropy-regularized version of this objective admits a spectral characterization: the relevant stationary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Age of Information Optimization