Optimistic Reinforcement Learning with Quantile Objectives
Mohammad Alipour-Vaezi, Huaiyang Zhong, Kwok-Leung Tsui, Sajad Khodadadian

TL;DR
This paper introduces UCB-QRL, an optimistic reinforcement learning algorithm designed to optimize quantile-based objectives in finite-horizon MDPs, addressing risk sensitivity in RL applications like healthcare and finance.
Contribution
The paper develops a novel algorithm for risk-sensitive RL that optimizes quantile objectives and provides theoretical regret bounds in finite-horizon MDPs.
Findings
UCB-QRL achieves a high-probability regret bound of order ((2/7)^{H+1}H7 ext{SATH}7 ext{log}(2SATH/7)).
The algorithm effectively incorporates risk sensitivity through quantile optimization.
Theoretical analysis demonstrates the regret bounds depend on the problem's quantile sensitivity constant 7.
Abstract
Reinforcement Learning (RL) has achieved tremendous success in recent years. However, the classical foundations of RL do not account for the risk sensitivity of the objective function, which is critical in various fields, including healthcare and finance. A popular approach to incorporate risk sensitivity is to optimize a specific quantile of the cumulative reward distribution. In this paper, we develop UCB-QRL, an optimistic learning algorithm for the -quantile objective in finite-horizon Markov decision processes (MDPs). UCB-QRL is an iterative algorithm in which, at each iteration, we first estimate the underlying transition probability and then optimize the quantile value function over a confidence ball around this estimate. We show that UCB-QRL yields a high-probability regret bound in the episodic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Risk and Portfolio Optimization
