Accommodating Picky Customers: Regret Bound and Exploration Complexity   for Multi-Objective Reinforcement Learning

Jingfeng Wu; Vladimir Braverman; Lin F. Yang

arXiv:2011.13034·cs.LG·October 29, 2021·6 cites

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Jingfeng Wu, Vladimir Braverman, Lin F. Yang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper develops algorithms for multi-objective reinforcement learning that effectively handle adversarial preferences, providing near-optimal regret bounds and exploration complexity in both online and preference-free settings.

Contribution

The paper introduces a model-based algorithm with minimax optimal regret bounds and a preference-free exploration method with near-optimal trajectory complexity for multi-objective RL.

Findings

01

Achieves nearly minimax optimal regret bound in online setting

02

Provides a preference-free exploration algorithm with near-optimal trajectory complexity

03

Partially resolves an open problem in multi-objective reinforcement learning

Abstract

In this paper we consider multi-objective reinforcement learning where the objectives are balanced using preferences. In practice, the preferences are often given in an adversarial manner, e.g., customers can be picky in many applications. We formalize this problem as an episodic learning problem on a Markov decision process, where transitions are unknown and a reward function is the inner product of a preference vector with pre-specified multi-objective reward functions. We consider two settings. In the online setting, the agent receives a (adversarial) preference every episode and proposes policies to interact with the environment. We provide a model-based algorithm that achieves a nearly minimax optimal regret bound $O (min {d, S} \cdot H^{2} S A K)$ , where $d$ is the number of objectives, $S$ is the number of states, $A$ is the number of actions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uuujf/morl
noneOfficial

Videos

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Optimization and Search Problems