Inverse Reward Design

Dylan Hadfield-Menell; Smitha Milli; Pieter Abbeel; Stuart Russell,; Anca Dragan

arXiv:1711.02827·cs.AI·October 8, 2020·63 cites

Inverse Reward Design

Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart Russell,, Anca Dragan

PDF

Open Access 1 Repo

TL;DR

This paper introduces inverse reward design (IRD), a method to infer true objectives from designed rewards, helping autonomous agents avoid undesired behaviors caused by reward misspecification.

Contribution

It proposes IRD as a new approach to interpret reward functions in context, with approximate solutions for risk-averse planning in unseen scenarios.

Findings

01

IRD helps reduce negative side effects of reward misspecification

02

The approach mitigates reward hacking in autonomous agents

03

Empirical results demonstrate improved robustness in test environments

Abstract

Autonomous agents optimize the reward function we give them. What they don't know is how hard it is for us to design a reward function that actually captures what we want. When designing the reward, we might think of some specific training scenarios, and make sure that the reward will lead to the right behavior in those scenarios. Inevitably, agents encounter new scenarios (e.g., new types of terrain) where optimizing that same reward may lead to undesired behavior. Our insight is that reward functions are merely observations about what the designer actually wants, and that they should be interpreted in the context in which they were designed. We introduce inverse reward design (IRD) as the problem of inferring the true objective based on the designed reward and the training MDP. We introduce approximate methods for solving IRD problems, and use their solution to plan risk-averse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pliam1105/RBAIRD
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Multi-Objective Optimization Algorithms · AI-based Problem Solving and Planning · Manufacturing Process and Optimization