Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

Ronald Ortner; Daniil Ryabko

arXiv:1302.2550·cs.LG·February 12, 2013·44 cites

Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

Ronald Ortner, Daniil Ryabko

PDF

Open Access

TL;DR

This paper introduces a new algorithm for continuous reinforcement learning that achieves sublinear regret bounds by combining state aggregation with optimism strategies, under mild regularity assumptions.

Contribution

It presents the first regret bounds for undiscounted continuous state space reinforcement learning using a novel combination of state aggregation and confidence bounds.

Findings

01

Achieves sublinear regret bounds in continuous state spaces.

02

Requires only Holder continuity assumptions on rewards and transitions.

03

Demonstrates the effectiveness of optimism-based algorithms in this setting.

Abstract

We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space. The proposed algorithm combines state aggregation with the use of upper confidence bounds for implementing optimism in the face of uncertainty. Beside the existence of an optimal policy which satisfies the Poisson equation, the only assumptions made are Holder continuity of rewards and transition probabilities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adaptive Dynamic Programming Control