Goal Misgeneralization in Deep Reinforcement Learning

Lauro Langosco; Jack Koch; Lee Sharkey; Jacob Pfau; Laurent Orseau,; David Krueger

arXiv:2105.14111·cs.LG·January 11, 2023·22 cites

Goal Misgeneralization in Deep Reinforcement Learning

Lauro Langosco, Jack Koch, Lee Sharkey, Jacob Pfau, Laurent Orseau,, David Krueger

PDF

Open Access 4 Repos

TL;DR

This paper investigates goal misgeneralization in deep reinforcement learning, highlighting how agents can perform well in capabilities but still pursue incorrect goals out-of-distribution, with empirical evidence and analysis of causes.

Contribution

It formally distinguishes goal from capability generalization failures, provides the first empirical demonstrations of goal misgeneralization, and analyzes its underlying causes.

Findings

01

Empirical demonstration of goal misgeneralization in RL agents.

02

Formal distinction between capability and goal generalization failures.

03

Partial characterization of causes of goal misgeneralization.

Abstract

We study goal misgeneralization, a type of out-of-distribution generalization failure in reinforcement learning (RL). Goal misgeneralization failures occur when an RL agent retains its capabilities out-of-distribution yet pursues the wrong goal. For instance, an agent might continue to competently avoid obstacles, but navigate to the wrong place. In contrast, previous works have typically focused on capability generalization failures, where an agent fails to do anything sensible at test time. We formalize this distinction between capability and goal generalization, provide the first empirical demonstrations of goal misgeneralization, and present a partial characterization of its causes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Experimental Behavioral Economics Studies