C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front
Ruohong Liu, Yuxin Pan, Linjie Xu, Lei Song, Jiang Bian, Pengcheng, You, Yize Chen

TL;DR
C-MORL introduces a two-stage algorithm for efficiently discovering the Pareto front in multi-objective reinforcement learning, improving performance and scalability over previous methods especially with many objectives.
Contribution
The paper presents C-MORL, a novel two-stage Pareto front discovery method that combines parallel policy training with constrained optimization, addressing scalability and efficiency issues in MORL.
Findings
Outperforms recent MORL methods in hypervolume and utility metrics.
Effective on tasks with up to nine objectives.
Achieves more consistent and superior Pareto front coverage.
Abstract
Multi-objective reinforcement learning (MORL) excels at handling rapidly changing preferences in tasks that involve multiple criteria, even for unseen preferences. However, previous dominating MORL methods typically generate a fixed policy set or preference-conditioned policy through multiple training iterations exclusively for sampled preference vectors, and cannot ensure the efficient discovery of the Pareto front. Furthermore, integrating preferences into the input of policy or value functions presents scalability challenges, in particular as the dimension of the state and preference space grow, which can complicate the learning process and hinder the algorithm's performance on more complex tasks. To address these issues, we propose a two-stage Pareto front discovery algorithm called Constrained MORL (C-MORL), which serves as a seamless bridge between constrained policy optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety
MethodsSparse Evolutionary Training
