A Generalized Algorithm for Multi-Objective Reinforcement Learning and   Policy Adaptation

Runzhe Yang; Xingyuan Sun; Karthik Narasimhan

arXiv:1908.08342·cs.LG·November 7, 2019·108 cites

A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation

Runzhe Yang, Xingyuan Sun, Karthik Narasimhan

PDF

Open Access 4 Repos

TL;DR

This paper presents a generalized algorithm for multi-objective reinforcement learning that enables quick adaptation to new preferences by learning a single policy representation, demonstrated across multiple domains.

Contribution

The authors introduce a generalized Bellman equation for MORL, allowing a single model to adapt efficiently to various preferences and infer preferences with minimal samples.

Findings

01

Effective adaptation to new preferences with few samples

02

Successful application across four different domains

03

Single parametric policy representation for all preferences

Abstract

We introduce a new algorithm for multi-objective reinforcement learning (MORL) with linear preferences, with the goal of enabling few-shot adaptation to new tasks. In MORL, the aim is to learn policies over multiple competing objectives whose relative importance (preferences) is unknown to the agent. While this alleviates dependence on scalar reward design, the expected return of a policy can change significantly with varying preferences, making it challenging to learn a single model to produce optimal policies under different preference conditions. We propose a generalized version of the Bellman equation to learn a single parametric representation for optimal policies over the space of all possible preferences. After an initial learning phase, our agent can execute the optimal policy under any given preference, or automatically infer an underlying preference with very few samples.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Elevator Systems and Control