Traversing Pareto Optimal Policies: Provably Efficient Multi-Objective Reinforcement Learning
Shuang Qiu, Dake Zhang, Rui Yang, Boxiang Lyu, Tong Zhang

TL;DR
This paper systematically analyzes optimization targets in multi-objective reinforcement learning, identifies Tchebycheff scalarization as effective, and proposes efficient algorithms with theoretical guarantees for learning Pareto optimal policies.
Contribution
It introduces a reformulation of Tchebycheff scalarization, develops algorithms with provable sample complexity, and offers a preference-free exploration framework for MORL.
Findings
Tchebycheff scalarization is effective for MORL.
Proposed algorithms achieve $ ilde{O}(rac{1}{ ext{epsilon}^2})$ sample complexity.
Preference-free exploration reduces environment interaction costs.
Abstract
This paper investigates multi-objective reinforcement learning (MORL), which focuses on learning Pareto optimal policies in the presence of multiple reward functions. Despite MORL's significant empirical success, there is still a lack of satisfactory understanding of various MORL optimization targets and efficient learning algorithms. Our work offers a systematic analysis of several optimization targets to assess their abilities to find all Pareto optimal policies and controllability over learned policies by the preferences for different objectives. We then identify Tchebycheff scalarization as a favorable scalarization method for MORL. Considering the non-smoothness of Tchebycheff scalarization, we reformulate its minimization problem into a new min-max-max optimization problem. Then, for the stochastic policy class, we propose efficient algorithms using this reformulation to learn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
