Walking the Values in Bayesian Inverse Reinforcement Learning

Ondrej Bajgar; Alessandro Abate; Konstantinos Gatsis; Michael A.; Osborne

arXiv:2407.10971·cs.LG·July 16, 2024

Walking the Values in Bayesian Inverse Reinforcement Learning

Ondrej Bajgar, Alessandro Abate, Konstantinos Gatsis, Michael A., Osborne

PDF

Open Access

TL;DR

This paper introduces ValueWalk, a novel Bayesian IRL method that efficiently samples in Q-value space rather than reward space, reducing computational costs and enabling better posterior estimation from demonstrations.

Contribution

The paper proposes a new sampling approach in Bayesian IRL focusing on Q-values, improving efficiency and scalability over traditional reward-based methods.

Findings

01

ValueWalk reduces computational costs in Bayesian IRL.

02

It enables efficient posterior sampling using Hamiltonian Monte Carlo.

03

Demonstrates improved performance on several benchmark tasks.

Abstract

The goal of Bayesian inverse reinforcement learning (IRL) is recovering a posterior distribution over reward functions using a set of demonstrations from an expert optimizing for a reward unknown to the learner. The resulting posterior over rewards can then be used to synthesize an apprentice policy that performs well on the same or a similar task. A key challenge in Bayesian IRL is bridging the computational gap between the hypothesis space of possible rewards and the likelihood, often defined in terms of Q values: vanilla Bayesian IRL needs to solve the costly forward planning problem - going from rewards to the Q values - at every step of the algorithm, which may need to be done thousands of times. We propose to solve this by a simple change: instead of focusing on primarily sampling in the space of rewards, we can focus on primarily working in the space of Q-values, since the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsSparse Evolutionary Training · Focus