Maximum Entropy Exploration Without the Rollouts
Jacob Adamczyk, Adam Kamoski, Rahul V. Kulkarni

TL;DR
This paper introduces EVE, a novel eigenvector-based algorithm for maximum entropy exploration in reinforcement learning that avoids costly rollouts by leveraging spectral methods and iterative updates.
Contribution
It proposes a spectral characterization of maximum entropy exploration, leading to an efficient eigenvector-based algorithm that does not require explicit state visitation estimation.
Findings
EVE efficiently computes high-entropy policies without rollouts.
EVE achieves competitive exploration performance in grid-world environments.
The approach converges monotonically and is theoretically justified.
Abstract
Efficient exploration remains a central challenge in reinforcement learning, serving as a useful pretraining objective for data collection, particularly when an external reward function is unavailable. A principled formulation of the exploration problem is to find policies that maximize the entropy of their induced steady-state visitation distribution, thereby encouraging uniform long-run coverage of the state space. Many existing exploration approaches require estimating state visitation frequencies through repeated on-policy rollouts, which can be computationally expensive. In this work, we instead consider an intrinsic average-reward formulation in which the reward is derived from the visitation distribution itself, so that the optimal policy maximizes steady-state entropy. An entropy-regularized version of this objective admits a spectral characterization: the relevant stationary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Age of Information Optimization
