Optimism in Reinforcement Learning with Generalized Linear Function   Approximation

Yining Wang; Ruosong Wang; Simon S. Du; Akshay Krishnamurthy

arXiv:1912.04136·stat.ML·December 10, 2019·54 cites

Optimism in Reinforcement Learning with Generalized Linear Function Approximation

Yining Wang, Ruosong Wang, Simon S. Du, Akshay Krishnamurthy

PDF

Open Access 1 Video

TL;DR

This paper introduces a new efficient reinforcement learning algorithm using generalized linear function approximation, leveraging a weaker assumption called 'optimistic closure' to achieve strong regret bounds.

Contribution

It presents the first statistically and computationally efficient RL algorithm for generalized linear functions under the optimistic closure assumption.

Findings

01

Achieves a regret bound of O(^3 T)

02

Introduces the optimistic closure assumption

03

First efficient algorithm for this setting

Abstract

We design a new provably efficient algorithm for episodic reinforcement learning with generalized linear function approximation. We analyze the algorithm under a new expressivity assumption that we call "optimistic closure," which is strictly weaker than assumptions from prior analyses for the linear setting. With optimistic closure, we prove that our algorithm enjoys a regret bound of $\tilde{O} (d^{3} T)$ where $d$ is the dimensionality of the state-action features and $T$ is the number of episodes. This is the first statistically and computationally efficient algorithm for reinforcement learning with generalized linear functions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Optimism in Reinforcement Learning with Generalized Linear Function Approximation· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management