Walking the Values in Bayesian Inverse Reinforcement Learning
Ondrej Bajgar, Alessandro Abate, Konstantinos Gatsis, Michael A., Osborne

TL;DR
This paper introduces ValueWalk, a novel Bayesian IRL method that efficiently samples in Q-value space rather than reward space, reducing computational costs and enabling better posterior estimation from demonstrations.
Contribution
The paper proposes a new sampling approach in Bayesian IRL focusing on Q-values, improving efficiency and scalability over traditional reward-based methods.
Findings
ValueWalk reduces computational costs in Bayesian IRL.
It enables efficient posterior sampling using Hamiltonian Monte Carlo.
Demonstrates improved performance on several benchmark tasks.
Abstract
The goal of Bayesian inverse reinforcement learning (IRL) is recovering a posterior distribution over reward functions using a set of demonstrations from an expert optimizing for a reward unknown to the learner. The resulting posterior over rewards can then be used to synthesize an apprentice policy that performs well on the same or a similar task. A key challenge in Bayesian IRL is bridging the computational gap between the hypothesis space of possible rewards and the likelihood, often defined in terms of Q values: vanilla Bayesian IRL needs to solve the costly forward planning problem - going from rewards to the Q values - at every step of the algorithm, which may need to be done thousands of times. We propose to solve this by a simple change: instead of focusing on primarily sampling in the space of rewards, we can focus on primarily working in the space of Q-values, since the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsSparse Evolutionary Training · Focus
