ELEMENT: Episodic and Lifelong Exploration via Maximum Entropy

Hongming Li; Shujian Yu; Bin Liu; Jose C. Principe

arXiv:2412.03800·cs.LG·December 6, 2024

ELEMENT: Episodic and Lifelong Exploration via Maximum Entropy

Hongming Li, Shujian Yu, Bin Liu, Jose C. Principe

PDF

Open Access

TL;DR

ELEMENT introduces a multiscale, intrinsically motivated RL framework that enhances exploration without extrinsic rewards, using episodic entropy maximization and efficient kNN-based entropy estimation to improve transfer and scalability.

Contribution

It proposes a novel multiscale entropy optimization, an episodic maximum entropy approach, and a kNN-based entropy estimation method for improved lifelong exploration in RL.

Findings

01

Outperforms state-of-the-art intrinsic rewards in exploration tasks.

02

Effective in task-agnostic pre-training and offline RL data collection.

03

Reduces computational costs significantly compared to previous methods.

Abstract

This paper proposes \emph{Episodic and Lifelong Exploration via Maximum ENTropy} (ELEMENT), a novel, multiscale, intrinsically motivated reinforcement learning (RL) framework that is able to explore environments without using any extrinsic reward and transfer effectively the learned skills to downstream tasks. We advance the state of the art in three ways. First, we propose a multiscale entropy optimization to take care of the fact that previous maximum state entropy, for lifelong exploration with millions of state observations, suffers from vanishing rewards and becomes very expensive computationally across iterations. Therefore, we add an episodic maximum entropy over each episode to speedup the search further. Second, we propose a novel intrinsic reward for episodic entropy maximization named \emph{average episodic state entropy} which provides the optimal solution for a theoretical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Thermodynamics and Statistical Mechanics

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings