TL;DR
This paper introduces an adaptive discretization algorithm for model-free episodic reinforcement learning in large or continuous state-action spaces, improving performance by focusing on frequently visited regions.
Contribution
The paper proposes a novel $Q$-learning algorithm with data-driven adaptive discretization that automatically adjusts to the problem's structure, achieving regret guarantees without prior discretization or oracles.
Findings
Algorithm outperforms uniform discretization in experiments.
Adaptive partitions leverage the shape of the optimal $Q$-function.
Regret guarantees match prior algorithms under less restrictive conditions.
Abstract
We present an efficient algorithm for model-free episodic reinforcement learning on large (potentially continuous) state-action spaces. Our algorithm is based on a novel -learning policy with adaptive data-driven discretization. The central idea is to maintain a finer partition of the state-action space in regions which are frequently visited in historical trajectories, and have higher payoff estimates. We demonstrate how our adaptive partitions take advantage of the shape of the optimal -function and the joint space, without sacrificing the worst-case performance. In particular, we recover the regret guarantees of prior algorithms for continuous state-action spaces, which additionally require either an optimal discretization as input, and/or access to a simulation oracle. Moreover, experiments demonstrate how our algorithm automatically adapts to the underlying structure of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
