Accelerating Reinforcement Learning with Value-Conditional State Entropy   Exploration

Dongyoung Kim; Jinwoo Shin; Pieter Abbeel; Younggyo Seo

arXiv:2305.19476·cs.LG·August 12, 2024·1 cites

Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration

Dongyoung Kim, Jinwoo Shin, Pieter Abbeel, Younggyo Seo

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel exploration method for reinforcement learning that maximizes value-conditional state entropy, effectively balancing exploration between high- and low-value states to accelerate learning across diverse benchmarks.

Contribution

It proposes a new exploration technique that maximizes value-conditional state entropy, addressing limitations of traditional state entropy methods in supervised RL settings.

Findings

01

Significantly accelerates RL algorithms on multiple benchmarks.

02

Effectively balances exploration between high- and low-value states.

03

Demonstrates robustness across MiniGrid, DeepMind Control Suite, and Meta-World.

Abstract

A promising technique for exploration is to maximize the entropy of visited state distribution, i.e., state entropy, by encouraging uniform coverage of visited state space. While it has been effective for an unsupervised setup, it tends to struggle in a supervised setup with a task reward, where an agent prefers to visit high-value states to exploit the task reward. Such a preference can cause an imbalance between the distributions of high-value states and low-value states, which biases exploration towards low-value state regions as a result of the state entropy increasing when the distribution becomes more uniform. This issue is exacerbated when high-value states are narrowly distributed within the state space, making it difficult for the agent to complete the tasks. In this paper, we present a novel exploration technique that maximizes the value-conditional state entropy, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Smart Grid Security and Resilience