Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning
Wesley A. Suttle, Aamodh Suresh, Carlos Nieto-Granda

TL;DR
This paper introduces behavioral entropy as a new exploration metric for dataset generation in offline reinforcement learning, demonstrating its effectiveness in producing diverse datasets that improve downstream task performance.
Contribution
It extends behavioral entropy to continuous domains, develops estimators with theoretical guarantees, and shows its superiority over existing entropy-based methods in MuJoCo environments.
Findings
BE-based datasets outperform Shannon entropy datasets in offline RL tasks.
BE-guided exploration leads to more diverse state coverage.
BE outperforms Rényi, SMM, and RND in most evaluated tasks.
Abstract
Entropy-based objectives are widely used to perform state space exploration in reinforcement learning (RL) and dataset generation for offline RL. Behavioral entropy (BE), a rigorous generalization of classical entropies that incorporates cognitive and perceptual biases of agents, was recently proposed for discrete settings and shown to be a promising metric for robotic exploration problems. In this work, we propose using BE as a principled exploration objective for systematically generating datasets that provide diverse state space coverage in complex, continuous, potentially high-dimensional domains. To achieve this, we extend the notion of BE to continuous settings, derive tractable -nearest neighbor estimators, provide theoretical guarantees for these estimators, and develop practical reward functions that can be used with standard RL methods to learn BE-maximizing policies. Using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
