Reward Learning with Intractable Normalizing Functions
Joshua Hoegerman, Dylan P. Losey

TL;DR
This paper introduces Double MH, a Monte Carlo method for Bayesian reward learning in robots, effectively addressing the intractability of normalizing functions and improving reward inference accuracy from human demonstrations and corrections.
Contribution
The paper proposes Double MH, a novel Monte Carlo approach inspired by statistics, to better infer human reward functions in continuous spaces, extending it to various settings.
Findings
Double MH outperforms existing approximations in simulations
The method achieves higher accuracy in inferring human rewards
Applicable to both demonstrations and corrections
Abstract
Robots can learn to imitate humans by inferring what the human is optimizing for. One common framework for this is Bayesian reward learning, where the robot treats the human's demonstrations and corrections as observations of their underlying reward function. Unfortunately, this inference is doubly-intractable: the robot must reason over all the trajectories the person could have provided and all the rewards the person could have in mind. Prior work uses existing robotic tools to approximate this normalizer. In this paper, we group previous approaches into three fundamental classes and analyze the theoretical pros and cons of their approach. We then leverage recent research from the statistics community to introduce Double MH reward learning, a Monte Carlo method for asymptotically learning the human's reward in continuous spaces. We extend Double MH to conditionally independent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference · Bayesian Modeling and Causal Inference
