Limitations of Scalarisation in MORL: A Comparative Study in Discrete Environments
Muhammad Sa'ood Shah, Asad Jeewa

TL;DR
This paper critically examines the limitations of scalarisation functions in Multi-Objective Reinforcement Learning within discrete environments, highlighting their environment-dependent performance and proposing multi-policy algorithms as more robust alternatives.
Contribution
It provides a comparative analysis of scalarisation-based and multi-policy MORL algorithms, demonstrating the latter's advantages in complex, uncertain environments.
Findings
Scalarisation functions often fail to accurately approximate the Pareto front.
Performance of scalarisation depends heavily on environment and Pareto front shape.
Multi-policy algorithms like Pareto Q-Learning offer more robust decision-making.
Abstract
Scalarisation functions are widely employed in MORL algorithms to enable intelligent decision-making. However, these functions often struggle to approximate the Pareto front accurately, rendering them unideal in complex, uncertain environments. This study examines selected Multi-Objective Reinforcement Learning (MORL) algorithms across MORL environments with discrete action and observation spaces. We aim to investigate further the limitations associated with scalarisation approaches for decision-making in multi-objective settings. Specifically, we use an outer-loop multi-policy methodology to assess the performance of a seminal single-policy MORL algorithm, MO Q-Learning implemented with linear scalarisation and Chebyshev scalarisation functions. In addition, we explore a pioneering inner-loop multi-policy algorithm, Pareto Q-Learning, which offers a more robust alternative. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms · Adaptive Dynamic Programming Control
