Scaling Pareto-Efficient Decision Making Via Offline Multi-Objective RL

Baiting Zhu; Meihua Dang; Aditya Grover

arXiv:2305.00567·cs.LG·May 2, 2023·1 cites

Scaling Pareto-Efficient Decision Making Via Offline Multi-Objective RL

Baiting Zhu, Meihua Dang, Aditya Grover

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a new offline multi-objective reinforcement learning framework with a large dataset and a novel preference-conditioned policy, enabling Pareto-efficient decision making without prior preference knowledge.

Contribution

It presents D4MORL, a large dataset for offline MORL, and PEDA, a new preference-conditioned policy algorithm for Pareto-efficient decision making.

Findings

01

PEDA closely approximates the behavioral policy.

02

PEDA effectively approximates the Pareto front.

03

The dataset enables robust offline MORL evaluation.

Abstract

The goal of multi-objective reinforcement learning (MORL) is to learn policies that simultaneously optimize multiple competing objectives. In practice, an agent's preferences over the objectives may not be known apriori, and hence, we require policies that can generalize to arbitrary preferences at test time. In this work, we propose a new data-driven setup for offline MORL, where we wish to learn a preference-agnostic policy agent using only a finite dataset of offline demonstrations of other agents and their preferences. The key contributions of this work are two-fold. First, we introduce D4MORL, (D)atasets for MORL that are specifically designed for offline settings. It contains 1.8 million annotated demonstrations obtained by rolling out reference policies that optimize for randomly sampled preferences on 6 MuJoCo environments with 2-3 objectives each. Second, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

baitingzbt/peda
pytorchOfficial

Videos

Scaling Pareto-Efficient Decision Making via Offline Multi-Objective RL· slideslive

Taxonomy

TopicsMachine Learning and Data Classification · Advanced Multi-Objective Optimization Algorithms · Reinforcement Learning in Robotics

MethodsTest