Scalable Multi-Objective and Meta Reinforcement Learning via Gradient Estimation

Zhenshuo Zhang; Minxuan Duan; Youran Ye; Hongyang R. Zhang

arXiv:2511.12779·cs.LG·February 24, 2026

Scalable Multi-Objective and Meta Reinforcement Learning via Gradient Estimation

Zhenshuo Zhang, Minxuan Duan, Youran Ye, Hongyang R. Zhang

PDF

Open Access

TL;DR

This paper introduces PolicyGradEx, a scalable method for multi-objective reinforcement learning that efficiently clusters related objectives using gradient estimation, leading to improved performance and faster training in complex environments.

Contribution

We propose a novel two-stage meta-learning approach with gradient-based clustering to optimize multiple objectives efficiently in RL, validated by empirical results.

Findings

01

Outperforms state-of-the-art baselines by 16% on average.

02

Achieves up to 26x faster training speed.

03

Gradient-similarity-based grouping improves results by 19%.

Abstract

We study the problem of efficiently estimating policies that simultaneously optimize multiple objectives in reinforcement learning (RL). Given $n$ objectives (or tasks), we seek the optimal partition of these objectives into $k ≪ n$ groups, where each group comprises related objectives that can be trained together. This problem arises in applications such as robotics, control, and preference optimization in language models, where learning a single policy for all $n$ objectives is suboptimal as $n$ grows. We introduce a two-stage procedure -- meta-training followed by fine-tuning -- to address this problem. We first learn a meta-policy for all objectives using multitask learning. Then, we adapt the meta-policy to multiple randomly sampled subsets of objectives. The adaptation step leverages a first-order approximation property of well-trained policy networks, which is empirically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification