TL;DR
This paper introduces reinforcement learning algorithms designed for lexicographic multi-objective problems, enabling agents to prioritize multiple rewards in a strict order, with proven convergence and practical evaluation.
Contribution
It presents a new family of algorithms for lexicographic multi-objective reinforcement learning with convergence guarantees and demonstrates their effectiveness in safety-constrained scenarios.
Findings
Algorithms converge to lexicographically optimal policies.
Effective in imposing safety constraints on agents.
Show improved performance over existing constrained RL methods.
Abstract
In this work we introduce reinforcement learning techniques for solving lexicographic multi-objective problems. These are problems that involve multiple reward signals, and where the goal is to learn a policy that maximises the first reward signal, and subject to this constraint also maximises the second reward signal, and so on. We present a family of both action-value and policy gradient algorithms that can be used to solve such problems, and prove that they converge to policies that are lexicographically optimal. We evaluate the scalability and performance of these algorithms empirically, demonstrating their practical applicability. As a more specific application, we show how our algorithms can be used to impose safety constraints on the behaviour of an agent, and compare their performance in this context with that of other constrained reinforcement learning algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
