Optimism in Reinforcement Learning with Generalized Linear Function Approximation
Yining Wang, Ruosong Wang, Simon S. Du, Akshay Krishnamurthy

TL;DR
This paper introduces a new efficient reinforcement learning algorithm using generalized linear function approximation, leveraging a weaker assumption called 'optimistic closure' to achieve strong regret bounds.
Contribution
It presents the first statistically and computationally efficient RL algorithm for generalized linear functions under the optimistic closure assumption.
Findings
Achieves a regret bound of O(^3 T)
Introduces the optimistic closure assumption
First efficient algorithm for this setting
Abstract
We design a new provably efficient algorithm for episodic reinforcement learning with generalized linear function approximation. We analyze the algorithm under a new expressivity assumption that we call "optimistic closure," which is strictly weaker than assumptions from prior analyses for the linear setting. With optimistic closure, we prove that our algorithm enjoys a regret bound of where is the dimensionality of the state-action features and is the number of episodes. This is the first statistically and computationally efficient algorithm for reinforcement learning with generalized linear functions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management
