Traversing Pareto Optimal Policies: Provably Efficient Multi-Objective   Reinforcement Learning

Shuang Qiu; Dake Zhang; Rui Yang; Boxiang Lyu; Tong Zhang

arXiv:2407.17466·cs.LG·July 25, 2024

Traversing Pareto Optimal Policies: Provably Efficient Multi-Objective Reinforcement Learning

Shuang Qiu, Dake Zhang, Rui Yang, Boxiang Lyu, Tong Zhang

PDF

TL;DR

This paper systematically analyzes optimization targets in multi-objective reinforcement learning, identifies Tchebycheff scalarization as effective, and proposes efficient algorithms with theoretical guarantees for learning Pareto optimal policies.

Contribution

It introduces a reformulation of Tchebycheff scalarization, develops algorithms with provable sample complexity, and offers a preference-free exploration framework for MORL.

Findings

01

Tchebycheff scalarization is effective for MORL.

02

Proposed algorithms achieve $ ilde{O}(rac{1}{ ext{epsilon}^2})$ sample complexity.

03

Preference-free exploration reduces environment interaction costs.

Abstract

This paper investigates multi-objective reinforcement learning (MORL), which focuses on learning Pareto optimal policies in the presence of multiple reward functions. Despite MORL's significant empirical success, there is still a lack of satisfactory understanding of various MORL optimization targets and efficient learning algorithms. Our work offers a systematic analysis of several optimization targets to assess their abilities to find all Pareto optimal policies and controllability over learned policies by the preferences for different objectives. We then identify Tchebycheff scalarization as a favorable scalarization method for MORL. Considering the non-smoothness of Tchebycheff scalarization, we reformulate its minimization problem into a new min-max-max optimization problem. Then, for the stochastic policy class, we propose efficient algorithms using this reformulation to learn…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.