Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and   Constant Regret

Jiawei Huang; Li Zhao; Tao Qin; Wei Chen; Nan Jiang; Tie-Yan Liu

arXiv:2205.12418·cs.LG·February 28, 2023

Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret

Jiawei Huang, Li Zhao, Tao Qin, Wei Chen, Nan Jiang, Tie-Yan Liu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a tiered reinforcement learning framework that separates policies for risk-tolerant and risk-averse users, demonstrating that a specialized exploitation policy can achieve constant regret for risk-averse users, unlike standard online RL.

Contribution

The paper proposes a novel tiered RL approach with separate policies for different user risk profiles and shows that using Pessimistic Value Iteration yields constant regret for risk-averse users.

Findings

01

Separation of policies does not benefit minimax regret in the gap-independent setting.

02

Using Pessimistic Value Iteration achieves constant regret for risk-averse users.

03

Online regret remains optimal for risk-tolerant users despite the tiered approach.

Abstract

We propose a new learning framework that captures the tiered structure of many real-world user-interaction applications, where the users can be divided into two groups based on their different tolerance on exploration risks and should be treated separately. In this setting, we simultaneously maintain two policies $π^{O}$ and $π^{E}$ : $π^{O}$ ("O" for "online") interacts with more risk-tolerant users from the first tier and minimizes regret by balancing exploration and exploitation as usual, while $π^{E}$ ("E" for "exploit") exclusively focuses on exploitation for risk-averse users from the second tier utilizing the data collected so far. An important question is whether such a separation yields advantages over the standard online setting (i.e., $π^{E} = π^{O}$ ) for the risk-averse users. We individually consider the gap-independent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiaweihhuang/tiered-rl-experiments
noneOfficial

Videos

Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)