Supervised Reward Inference

Will Schwarzer; Jordan Schneider; Philip S. Thomas; and Scott Niekum

arXiv:2502.18447·cs.LG·February 26, 2025

Supervised Reward Inference

Will Schwarzer, Jordan Schneider, Philip S. Thomas, and Scott Niekum

PDF

Open Access

TL;DR

This paper introduces a supervised learning framework for reward inference from diverse human behaviors, including suboptimal actions, demonstrating asymptotic Bayes-optimality and effectiveness in robotic tasks.

Contribution

It proposes a unified supervised learning approach for reward inference that handles arbitrary behaviors and proves its asymptotic optimality under mild conditions.

Findings

01

Efficient reward inference from suboptimal demonstrations

02

Method achieves asymptotic Bayes-optimality

03

Effective in simulated robotic manipulation tasks

Abstract

Existing approaches to reward inference from behavior typically assume that humans provide demonstrations according to specific models of behavior. However, humans often indicate their goals through a wide range of behaviors, from actions that are suboptimal due to poor planning or execution to behaviors which are intended to communicate goals rather than achieve them. We propose that supervised learning offers a unified framework to infer reward functions from any class of behavior, and show that such an approach is asymptotically Bayes-optimal under mild assumptions. Experiments on simulated robotic manipulation tasks show that our method can efficiently infer rewards from a wide variety of arbitrarily suboptimal demonstrations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStock Market Forecasting Methods