Logarithmic Regret for Reinforcement Learning with Linear Function   Approximation

Jiafan He; Dongruo Zhou; Quanquan Gu

arXiv:2011.11566·cs.LG·February 19, 2021·21 cites

Logarithmic Regret for Reinforcement Learning with Linear Function Approximation

Jiafan He, Dongruo Zhou, Quanquan Gu

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that logarithmic regret bounds are achievable in reinforcement learning with linear function approximation under certain assumptions, improving upon the previously known square-root regret bounds.

Contribution

It provides the first logarithmic regret bounds for RL with linear function approximation under linear MDP assumptions, using new algorithms and gap-dependent analysis.

Findings

01

LSVI-UCB achieves $ ilde{O}(d^{3}H^5/ ext{gap}_{ ext{min}} imes ext{log}(T))$ regret.

02

UCRL-VTR achieves $ ilde{O}(d^{2}H^5/ ext{gap}_{ ext{min}} imes ext{log}^3(T))$ regret.

03

Established gap-dependent lower bounds for linear MDP models.

Abstract

Reinforcement learning (RL) with linear function approximation has received increasing attention recently. However, existing work has focused on obtaining $T$ -type regret bound, where $T$ is the number of interactions with the MDP. In this paper, we show that logarithmic regret is attainable under two recently proposed linear MDP assumptions provided that there exists a positive sub-optimality gap for the optimal action-value function. More specifically, under the linear MDP assumption (Jin et al. 2019), the LSVI-UCB algorithm can achieve $\tilde{O} (d^{3} H^{5} / gap_{min} \cdot lo g (T))$ regret; and under the linear mixture MDP assumption (Ayoub et al. 2020), the UCRL-VTR algorithm can achieve $\tilde{O} (d^{2} H^{5} / gap_{min} \cdot lo g^{3} (T))$ regret, where $d$ is the dimension of feature mapping, $H$ is the length of episode, $gap_{min}$ is the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Logarithmic Regret for Reinforcement Learning with Linear Function Approximation· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management