Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement   Learning

Mingqi Yuan; Bo Li; Xin Jin; Wenjun Zeng

arXiv:2301.10886·cs.LG·October 13, 2023·1 cites

Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning

Mingqi Yuan, Bo Li, Xin Jin, Wenjun Zeng

PDF

Open Access 1 Repo 1 Video

TL;DR

AIRS is an adaptive method that dynamically selects intrinsic rewards to improve exploration in deep reinforcement learning, leading to better performance across diverse tasks.

Contribution

The paper introduces AIRS, a novel adaptive intrinsic reward shaping approach that selects shaping functions based on estimated returns, enhancing exploration efficiency.

Findings

01

AIRS outperforms benchmark schemes on multiple RL tasks.

02

The intrinsic reward toolkit enables efficient implementation of diverse approaches.

03

AIRS achieves superior performance with simple architecture.

Abstract

We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL). More specifically, AIRS selects shaping function from a predefined set based on the estimated task return in real-time, providing reliable exploration incentives and alleviating the biased objective problem. Moreover, we develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches. We test AIRS on various tasks of MiniGrid, Procgen, and DeepMind Control Suite. Extensive simulation demonstrates that AIRS can outperform the benchmarking schemes and achieve superior performance with simple architecture.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

openai/procgen
noneOfficial

Videos

Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research

MethodsTest