Expected Scalarised Returns Dominance: A New Solution Concept for   Multi-Objective Decision Making

Conor F. Hayes; Timothy Verstraeten; Diederik M. Roijers; Enda Howley,; Patrick Mannion

arXiv:2106.01048·cs.LG·July 6, 2022

Expected Scalarised Returns Dominance: A New Solution Concept for Multi-Objective Decision Making

Conor F. Hayes, Timothy Verstraeten, Diederik M. Roijers, Enda Howley,, Patrick Mannion

PDF

TL;DR

This paper introduces the ESR dominance criterion and the ESR set concept for multi-objective reinforcement learning, enabling the learning of optimal policies when user preferences are unknown, with a new distributional RL algorithm to learn these sets.

Contribution

It proposes ESR dominance and ESR set as novel solution concepts for maximizing expected utility in multi-objective RL with unknown preferences.

Findings

01

Introduces ESR dominance as a new criterion for policy optimality.

02

Defines the ESR set as a set of policies that are ESR dominant.

03

Develops a MOT-DRL algorithm to learn the ESR set in practice.

Abstract

In many real-world scenarios, the utility of a user is derived from the single execution of a policy. In this case, to apply multi-objective reinforcement learning, the expected utility of the returns must be optimised. Various scenarios exist where a user's preferences over objectives (also known as the utility function) are unknown or difficult to specify. In such scenarios, a set of optimal policies must be learned. However, settings where the expected utility must be maximised have been largely overlooked by the multi-objective reinforcement learning community and, as a consequence, a set of optimal solutions has yet to be defined. In this paper we address this challenge by proposing first-order stochastic dominance as a criterion to build solution sets to maximise expected utility. We also propose a new dominance criterion, known as expected scalarised returns (ESR) dominance, that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.