Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm
Qinbo Bai, Mridul Agarwal, Vaneet Aggarwal

TL;DR
This paper introduces a policy-gradient based algorithm for multi-objective reinforcement learning that optimizes a non-linear concave function of multiple long-term objectives, with proven convergence guarantees.
Contribution
It formulates a novel approach to optimize non-linear functions of multiple objectives and provides convergence analysis for the proposed model-free algorithm.
Findings
Achieves convergence to near-global optima after polynomially many trajectories.
Provides a biased gradient estimator suitable for the multi-objective setting.
Maintains similar dependence on accuracy as standard policy gradient methods.
Abstract
Many engineering problems have multiple objectives, and the overall aim is to optimize a non-linear function of these objectives. In this paper, we formulate the problem of maximizing a non-linear concave function of multiple long-term objectives. A policy-gradient based model-free algorithm is proposed for the problem. To compute an estimate of the gradient, a biased estimator is proposed. The proposed algorithm is shown to achieve convergence to within an of the global optima after sampling trajectories where is the discount factor and is the number of the agents, thus achieving the same dependence on as the policy gradient algorithm for the standard reinforcement learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms · Reinforcement Learning in Robotics · Advanced Control Systems Optimization
