Online Regret Bounds for Undiscounted Continuous Reinforcement Learning
Ronald Ortner, Daniil Ryabko

TL;DR
This paper introduces a new algorithm for continuous reinforcement learning that achieves sublinear regret bounds by combining state aggregation with optimism strategies, under mild regularity assumptions.
Contribution
It presents the first regret bounds for undiscounted continuous state space reinforcement learning using a novel combination of state aggregation and confidence bounds.
Findings
Achieves sublinear regret bounds in continuous state spaces.
Requires only Holder continuity assumptions on rewards and transitions.
Demonstrates the effectiveness of optimism-based algorithms in this setting.
Abstract
We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space. The proposed algorithm combines state aggregation with the use of upper confidence bounds for implementing optimism in the face of uncertainty. Beside the existence of an optimal policy which satisfies the Poisson equation, the only assumptions made are Holder continuity of rewards and transition probabilities.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adaptive Dynamic Programming Control
