Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization
Sreejith Balakrishnan, Quoc Phong Nguyen, Bryan Kian Hsiang Low,, Harold Soh

TL;DR
This paper introduces BO-IRL, a Bayesian optimization-based framework that efficiently explores the reward function space in inverse reinforcement learning, identifying multiple solutions with fewer costly policy evaluations.
Contribution
The paper proposes a novel kernel and latent space projection in BO-IRL, enabling efficient exploration of reward functions and capturing their correlations.
Findings
BO-IRL discovers multiple reward functions effectively.
It minimizes the number of expensive policy optimizations.
Demonstrates success on synthetic and real-world environments.
Abstract
The problem of inverse reinforcement learning (IRL) is relevant to a variety of tasks including value alignment and robot learning from demonstration. Despite significant algorithmic contributions in recent years, IRL remains an ill-posed problem at its core; multiple reward functions coincide with the observed behavior and the actual reward function is not identifiable without prior knowledge or supplementary information. This paper presents an IRL framework called Bayesian optimization-IRL (BO-IRL) which identifies multiple solutions that are consistent with the expert demonstrations by efficiently exploring the reward function space. BO-IRL achieves this by utilizing Bayesian Optimization along with our newly proposed kernel that (a) projects the parameters of policy invariant reward functions to a single point in a latent space and (b) ensures nearby points in the latent space…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Energy Efficiency and Management · Industrial Vision Systems and Defect Detection
