PA2D-MORL: Pareto Ascent Directional Decomposition based Multi-Objective Reinforcement Learning

Tianmeng Hu; Biao Luo

arXiv:2603.19579·cs.AI·March 23, 2026·AAAI

PA2D-MORL: Pareto Ascent Directional Decomposition based Multi-Objective Reinforcement Learning

Tianmeng Hu, Biao Luo

PDF

Open Access

TL;DR

PA2D-MORL introduces a novel multi-objective reinforcement learning approach that effectively approximates the Pareto policy set by leveraging Pareto ascent directions, evolutionary strategies, and adaptive fine-tuning, outperforming existing methods.

Contribution

The paper presents a new MORL method combining Pareto ascent directions, evolutionary policy optimization, and adaptive fine-tuning for better Pareto frontier approximation.

Findings

01

Outperforms state-of-the-art algorithms in robot control tasks.

02

Achieves higher quality and more stable Pareto frontiers.

03

Enhances the density and spread of Pareto solutions.

Abstract

Multi-objective reinforcement learning (MORL) provides an effective solution for decision-making problems involving conflicting objectives. However, achieving high-quality approximations to the Pareto policy set remains challenging, especially in complex tasks with continuous or high-dimensional state-action space. In this paper, we propose the Pareto Ascent Directional Decomposition based Multi-Objective Reinforcement Learning (PA2D-MORL) method, which constructs an efficient scheme for multi-objective problem decomposition and policy improvement, leading to a superior approximation of Pareto policy set. The proposed method leverages Pareto ascent direction to select the scalarization weights and computes the multi-objective policy gradient, which determines the policy optimization direction and ensures joint improvement on all objectives. Meanwhile, multiple policies are selectively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Multi-Objective Optimization Algorithms · Reinforcement Learning in Robotics · Adaptive Dynamic Programming Control