In Search for Architectures and Loss Functions in Multi-Objective   Reinforcement Learning

Mikhail Terekhov; Caglar Gulcehre

arXiv:2407.16807·cs.LG·July 25, 2024

In Search for Architectures and Loss Functions in Multi-Objective Reinforcement Learning

Mikhail Terekhov, Caglar Gulcehre

PDF

TL;DR

This paper introduces new architectures and loss functions for multi-objective reinforcement learning, demonstrating improved ability to capture Pareto fronts and robustness across various environments.

Contribution

It proposes MOPPO and MOA2C, extending PPO to MORL, with comprehensive empirical evaluation and analysis of architectural impacts.

Findings

01

MOPPO effectively captures the Pareto front.

02

MOPPO outperforms existing approaches like PCN and Envelope Q-learning.

03

Architectural choices significantly influence MORL performance.

Abstract

Multi-objective reinforcement learning (MORL) is essential for addressing the intricacies of real-world RL problems, which often require trade-offs between multiple utility functions. However, MORL is challenging due to unstable learning dynamics with deep learning-based function approximators. The research path most taken has been to explore different value-based loss functions for MORL to overcome this issue. Our work empirically explores model-free policy learning loss functions and the impact of different architectural choices. We introduce two different approaches: Multi-objective Proximal Policy Optimization (MOPPO), which extends PPO to MORL, and Multi-objective Advantage Actor Critic (MOA2C), which acts as a simple baseline in our ablations. Our proposed approach is straightforward to implement, requiring only small modifications at the level of function approximator. We conduct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsQ-Learning · Entropy Regularization · Proximal Policy Optimization