Scaling Pareto-Efficient Decision Making Via Offline Multi-Objective RL
Baiting Zhu, Meihua Dang, Aditya Grover

TL;DR
This paper introduces a new offline multi-objective reinforcement learning framework with a large dataset and a novel preference-conditioned policy, enabling Pareto-efficient decision making without prior preference knowledge.
Contribution
It presents D4MORL, a large dataset for offline MORL, and PEDA, a new preference-conditioned policy algorithm for Pareto-efficient decision making.
Findings
PEDA closely approximates the behavioral policy.
PEDA effectively approximates the Pareto front.
The dataset enables robust offline MORL evaluation.
Abstract
The goal of multi-objective reinforcement learning (MORL) is to learn policies that simultaneously optimize multiple competing objectives. In practice, an agent's preferences over the objectives may not be known apriori, and hence, we require policies that can generalize to arbitrary preferences at test time. In this work, we propose a new data-driven setup for offline MORL, where we wish to learn a preference-agnostic policy agent using only a finite dataset of offline demonstrations of other agents and their preferences. The key contributions of this work are two-fold. First, we introduce D4MORL, (D)atasets for MORL that are specifically designed for offline settings. It contains 1.8 million annotated demonstrations obtained by rolling out reference policies that optimize for randomly sampled preferences on 6 MuJoCo environments with 2-3 objectives each. Second, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Multi-Objective Optimization Algorithms · Reinforcement Learning in Robotics
MethodsTest
