Reward Learning with Intractable Normalizing Functions

Joshua Hoegerman; Dylan P. Losey

arXiv:2305.09606·cs.RO·October 20, 2023·1 cites

Reward Learning with Intractable Normalizing Functions

Joshua Hoegerman, Dylan P. Losey

PDF

Open Access 1 Repo

TL;DR

This paper introduces Double MH, a Monte Carlo method for Bayesian reward learning in robots, effectively addressing the intractability of normalizing functions and improving reward inference accuracy from human demonstrations and corrections.

Contribution

The paper proposes Double MH, a novel Monte Carlo approach inspired by statistics, to better infer human reward functions in continuous spaces, extending it to various settings.

Findings

01

Double MH outperforms existing approximations in simulations

02

The method achieves higher accuracy in inferring human rewards

03

Applicable to both demonstrations and corrections

Abstract

Robots can learn to imitate humans by inferring what the human is optimizing for. One common framework for this is Bayesian reward learning, where the robot treats the human's demonstrations and corrections as observations of their underlying reward function. Unfortunately, this inference is doubly-intractable: the robot must reason over all the trajectories the person could have provided and all the rewards the person could have in mind. Prior work uses existing robotic tools to approximate this normalizer. In this paper, we group previous approaches into three fundamental classes and analyze the theoretical pros and cons of their approach. We then leverage recent research from the statistics community to introduce Double MH reward learning, a Monte Carlo method for asymptotically learning the human's reward in continuous spaces. We extend Double MH to conditionally independent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vt-collab/reward-learning-with-intractable-normalizers
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference · Bayesian Modeling and Causal Inference