Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret
Jiawei Huang, Li Zhao, Tao Qin, Wei Chen, Nan Jiang, Tie-Yan Liu

TL;DR
This paper introduces a tiered reinforcement learning framework that separates policies for risk-tolerant and risk-averse users, demonstrating that a specialized exploitation policy can achieve constant regret for risk-averse users, unlike standard online RL.
Contribution
The paper proposes a novel tiered RL approach with separate policies for different user risk profiles and shows that using Pessimistic Value Iteration yields constant regret for risk-averse users.
Findings
Separation of policies does not benefit minimax regret in the gap-independent setting.
Using Pessimistic Value Iteration achieves constant regret for risk-averse users.
Online regret remains optimal for risk-tolerant users despite the tiered approach.
Abstract
We propose a new learning framework that captures the tiered structure of many real-world user-interaction applications, where the users can be divided into two groups based on their different tolerance on exploration risks and should be treated separately. In this setting, we simultaneously maintain two policies and : ("O" for "online") interacts with more risk-tolerant users from the first tier and minimizes regret by balancing exploration and exploitation as usual, while ("E" for "exploit") exclusively focuses on exploitation for risk-averse users from the second tier utilizing the data collected so far. An important question is whether such a separation yields advantages over the standard online setting (i.e., ) for the risk-averse users. We individually consider the gap-independent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
