A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation
Runzhe Yang, Xingyuan Sun, Karthik Narasimhan

TL;DR
This paper presents a generalized algorithm for multi-objective reinforcement learning that enables quick adaptation to new preferences by learning a single policy representation, demonstrated across multiple domains.
Contribution
The authors introduce a generalized Bellman equation for MORL, allowing a single model to adapt efficiently to various preferences and infer preferences with minimal samples.
Findings
Effective adaptation to new preferences with few samples
Successful application across four different domains
Single parametric policy representation for all preferences
Abstract
We introduce a new algorithm for multi-objective reinforcement learning (MORL) with linear preferences, with the goal of enabling few-shot adaptation to new tasks. In MORL, the aim is to learn policies over multiple competing objectives whose relative importance (preferences) is unknown to the agent. While this alleviates dependence on scalar reward design, the expected return of a policy can change significantly with varying preferences, making it challenging to learn a single model to produce optimal policies under different preference conditions. We propose a generalized version of the Bellman equation to learn a single parametric representation for optimal policies over the space of all possible preferences. After an initial learning phase, our agent can execute the optimal policy under any given preference, or automatically infer an underlying preference with very few samples.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Elevator Systems and Control
