Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement   Learning

Wesley A. Suttle; Aamodh Suresh; Carlos Nieto-Granda

arXiv:2502.04141·cs.LG·February 7, 2025

Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning

Wesley A. Suttle, Aamodh Suresh, Carlos Nieto-Granda

PDF

Open Access 1 Video

TL;DR

This paper introduces behavioral entropy as a new exploration metric for dataset generation in offline reinforcement learning, demonstrating its effectiveness in producing diverse datasets that improve downstream task performance.

Contribution

It extends behavioral entropy to continuous domains, develops estimators with theoretical guarantees, and shows its superiority over existing entropy-based methods in MuJoCo environments.

Findings

01

BE-based datasets outperform Shannon entropy datasets in offline RL tasks.

02

BE-guided exploration leads to more diverse state coverage.

03

BE outperforms Rényi, SMM, and RND in most evaluated tasks.

Abstract

Entropy-based objectives are widely used to perform state space exploration in reinforcement learning (RL) and dataset generation for offline RL. Behavioral entropy (BE), a rigorous generalization of classical entropies that incorporates cognitive and perceptual biases of agents, was recently proposed for discrete settings and shown to be a promising metric for robotic exploration problems. In this work, we propose using BE as a principled exploration objective for systematically generating datasets that provide diverse state space coverage in complex, continuous, potentially high-dimensional domains. To achieve this, we extend the notion of BE to continuous settings, derive tractable $k$ -nearest neighbor estimators, provide theoretical guarantees for these estimators, and develop practical reward functions that can be used with standard RL methods to learn BE-maximizing policies. Using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics