Expected Scalarised Returns Dominance: A New Solution Concept for Multi-Objective Decision Making
Conor F. Hayes, Timothy Verstraeten, Diederik M. Roijers, Enda Howley,, Patrick Mannion

TL;DR
This paper introduces the ESR dominance criterion and the ESR set concept for multi-objective reinforcement learning, enabling the learning of optimal policies when user preferences are unknown, with a new distributional RL algorithm to learn these sets.
Contribution
It proposes ESR dominance and ESR set as novel solution concepts for maximizing expected utility in multi-objective RL with unknown preferences.
Findings
Introduces ESR dominance as a new criterion for policy optimality.
Defines the ESR set as a set of policies that are ESR dominant.
Develops a MOT-DRL algorithm to learn the ESR set in practice.
Abstract
In many real-world scenarios, the utility of a user is derived from the single execution of a policy. In this case, to apply multi-objective reinforcement learning, the expected utility of the returns must be optimised. Various scenarios exist where a user's preferences over objectives (also known as the utility function) are unknown or difficult to specify. In such scenarios, a set of optimal policies must be learned. However, settings where the expected utility must be maximised have been largely overlooked by the multi-objective reinforcement learning community and, as a consequence, a set of optimal solutions has yet to be defined. In this paper we address this challenge by proposing first-order stochastic dominance as a criterion to build solution sets to maximise expected utility. We also propose a new dominance criterion, known as expected scalarised returns (ESR) dominance, that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
