Scalable Multi-Objective and Meta Reinforcement Learning via Gradient Estimation
Zhenshuo Zhang, Minxuan Duan, Youran Ye, Hongyang R. Zhang

TL;DR
This paper introduces PolicyGradEx, a scalable method for multi-objective reinforcement learning that efficiently clusters related objectives using gradient estimation, leading to improved performance and faster training in complex environments.
Contribution
We propose a novel two-stage meta-learning approach with gradient-based clustering to optimize multiple objectives efficiently in RL, validated by empirical results.
Findings
Outperforms state-of-the-art baselines by 16% on average.
Achieves up to 26x faster training speed.
Gradient-similarity-based grouping improves results by 19%.
Abstract
We study the problem of efficiently estimating policies that simultaneously optimize multiple objectives in reinforcement learning (RL). Given objectives (or tasks), we seek the optimal partition of these objectives into groups, where each group comprises related objectives that can be trained together. This problem arises in applications such as robotics, control, and preference optimization in language models, where learning a single policy for all objectives is suboptimal as grows. We introduce a two-stage procedure -- meta-training followed by fine-tuning -- to address this problem. We first learn a meta-policy for all objectives using multitask learning. Then, we adapt the meta-policy to multiple randomly sampled subsets of objectives. The adaptation step leverages a first-order approximation property of well-trained policy networks, which is empirically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification
