Potential-Based Intrinsic Motivation: Preserving Optimality With   Complex, Non-Markovian Shaping Rewards

Grant C. Forbes; Leonardo Villalobos-Arias; Jianxun Wang; Arnav Jhala,; David L. Roberts

arXiv:2410.12197·cs.LG·October 17, 2024

Potential-Based Intrinsic Motivation: Preserving Optimality With Complex, Non-Markovian Shaping Rewards

Grant C. Forbes, Leonardo Villalobos-Arias, Jianxun Wang, Arnav Jhala,, David L. Roberts

PDF

Open Access

TL;DR

This paper extends potential-based reward shaping to complex intrinsic motivation functions, ensuring optimality preservation and improving learning efficiency in sparse, complex environments.

Contribution

It introduces a generalized potential-based reward shaping method applicable to complex, trainable intrinsic motivation functions, with proofs and experimental validation.

Findings

01

PBIM and GRM prevent suboptimal policy convergence

02

Methods speed up training in complex environments

03

GRM encompasses all potential-based reward shaping functions

Abstract

Recently there has been a proliferation of intrinsic motivation (IM) reward-shaping methods to learn in complex and sparse-reward environments. These methods can often inadvertently change the set of optimal policies in an environment, leading to suboptimal behavior. Previous work on mitigating the risks of reward shaping, particularly through potential-based reward shaping (PBRS), has not been applicable to many IM methods, as they are often complex, trainable functions themselves, and therefore dependent on a wider set of variables than the traditional reward functions that PBRS was developed for. We present an extension to PBRS that we prove preserves the set of optimal policies under a more general set of functions than has been previously proven. We also present {\em Potential-Based Intrinsic Motivation} (PBIM) and {\em Generalized Reward Matching} (GRM), methods for converting IM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInnovation Diffusion and Forecasting

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Sparse Evolutionary Training