Adaptive Discretization for Episodic Reinforcement Learning in Metric   Spaces

Sean R. Sinclair; Siddhartha Banerjee; Christina Lee Yu

arXiv:1910.08151·cs.LG·December 20, 2019

Adaptive Discretization for Episodic Reinforcement Learning in Metric Spaces

Sean R. Sinclair, Siddhartha Banerjee, Christina Lee Yu

PDF

1 Repo

TL;DR

This paper introduces an adaptive discretization algorithm for model-free episodic reinforcement learning in large or continuous state-action spaces, improving performance by focusing on frequently visited regions.

Contribution

The paper proposes a novel $Q$-learning algorithm with data-driven adaptive discretization that automatically adjusts to the problem's structure, achieving regret guarantees without prior discretization or oracles.

Findings

01

Algorithm outperforms uniform discretization in experiments.

02

Adaptive partitions leverage the shape of the optimal $Q$-function.

03

Regret guarantees match prior algorithms under less restrictive conditions.

Abstract

We present an efficient algorithm for model-free episodic reinforcement learning on large (potentially continuous) state-action spaces. Our algorithm is based on a novel $Q$ -learning policy with adaptive data-driven discretization. The central idea is to maintain a finer partition of the state-action space in regions which are frequently visited in historical trajectories, and have higher payoff estimates. We demonstrate how our adaptive partitions take advantage of the shape of the optimal $Q$ -function and the joint space, without sacrificing the worst-case performance. In particular, we recover the regret guarantees of prior algorithms for continuous state-action spaces, which additionally require either an optimal discretization as input, and/or access to a simulation oracle. Moreover, experiments demonstrate how our algorithm automatically adapts to the underlying structure of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

seanrsinclair/AdaptiveQLearning
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.