Logarithmic Regret for Reinforcement Learning with Linear Function Approximation
Jiafan He, Dongruo Zhou, Quanquan Gu

TL;DR
This paper demonstrates that logarithmic regret bounds are achievable in reinforcement learning with linear function approximation under certain assumptions, improving upon the previously known square-root regret bounds.
Contribution
It provides the first logarithmic regret bounds for RL with linear function approximation under linear MDP assumptions, using new algorithms and gap-dependent analysis.
Findings
LSVI-UCB achieves $ ilde{O}(d^{3}H^5/ ext{gap}_{ ext{min}} imes ext{log}(T))$ regret.
UCRL-VTR achieves $ ilde{O}(d^{2}H^5/ ext{gap}_{ ext{min}} imes ext{log}^3(T))$ regret.
Established gap-dependent lower bounds for linear MDP models.
Abstract
Reinforcement learning (RL) with linear function approximation has received increasing attention recently. However, existing work has focused on obtaining -type regret bound, where is the number of interactions with the MDP. In this paper, we show that logarithmic regret is attainable under two recently proposed linear MDP assumptions provided that there exists a positive sub-optimality gap for the optimal action-value function. More specifically, under the linear MDP assumption (Jin et al. 2019), the LSVI-UCB algorithm can achieve regret; and under the linear mixture MDP assumption (Ayoub et al. 2020), the UCRL-VTR algorithm can achieve regret, where is the dimension of feature mapping, is the length of episode, is the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management
