Potential-Based Intrinsic Motivation: Preserving Optimality With Complex, Non-Markovian Shaping Rewards
Grant C. Forbes, Leonardo Villalobos-Arias, Jianxun Wang, Arnav Jhala,, David L. Roberts

TL;DR
This paper extends potential-based reward shaping to complex intrinsic motivation functions, ensuring optimality preservation and improving learning efficiency in sparse, complex environments.
Contribution
It introduces a generalized potential-based reward shaping method applicable to complex, trainable intrinsic motivation functions, with proofs and experimental validation.
Findings
PBIM and GRM prevent suboptimal policy convergence
Methods speed up training in complex environments
GRM encompasses all potential-based reward shaping functions
Abstract
Recently there has been a proliferation of intrinsic motivation (IM) reward-shaping methods to learn in complex and sparse-reward environments. These methods can often inadvertently change the set of optimal policies in an environment, leading to suboptimal behavior. Previous work on mitigating the risks of reward shaping, particularly through potential-based reward shaping (PBRS), has not been applicable to many IM methods, as they are often complex, trainable functions themselves, and therefore dependent on a wider set of variables than the traditional reward functions that PBRS was developed for. We present an extension to PBRS that we prove preserves the set of optimal policies under a more general set of functions than has been previously proven. We also present {\em Potential-Based Intrinsic Motivation} (PBIM) and {\em Generalized Reward Matching} (GRM), methods for converting IM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovation Diffusion and Forecasting
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Sparse Evolutionary Training
