In Search for Architectures and Loss Functions in Multi-Objective Reinforcement Learning
Mikhail Terekhov, Caglar Gulcehre

TL;DR
This paper introduces new architectures and loss functions for multi-objective reinforcement learning, demonstrating improved ability to capture Pareto fronts and robustness across various environments.
Contribution
It proposes MOPPO and MOA2C, extending PPO to MORL, with comprehensive empirical evaluation and analysis of architectural impacts.
Findings
MOPPO effectively captures the Pareto front.
MOPPO outperforms existing approaches like PCN and Envelope Q-learning.
Architectural choices significantly influence MORL performance.
Abstract
Multi-objective reinforcement learning (MORL) is essential for addressing the intricacies of real-world RL problems, which often require trade-offs between multiple utility functions. However, MORL is challenging due to unstable learning dynamics with deep learning-based function approximators. The research path most taken has been to explore different value-based loss functions for MORL to overcome this issue. Our work empirically explores model-free policy learning loss functions and the impact of different architectural choices. We introduce two different approaches: Multi-objective Proximal Policy Optimization (MOPPO), which extends PPO to MORL, and Multi-objective Advantage Actor Critic (MOA2C), which acts as a simple baseline in our ablations. Our proposed approach is straightforward to implement, requiring only small modifications at the level of function approximator. We conduct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsQ-Learning · Entropy Regularization · Proximal Policy Optimization
