Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs
Davide Maran, Alberto Maria Metelli, Matteo Papini, Marcello, Restelli

TL;DR
This paper introduces a novel projection technique using harmonic analysis for reinforcement learning in continuous-space MDPs, achieving rate-optimal sample complexity that bridges discretization and low-rank approaches.
Contribution
It presents a simple perturbed least-squares value iteration method with a new harmonic analysis-based projection, achieving optimal sample complexity for smooth Bellman operators in continuous MDPs.
Findings
Achieves rate-optimal sample complexity for continuous-space MDPs.
Recovers and generalizes existing rates for Lipschitz and low-rank MDPs.
Bridges the gap between discretization and low-rank approaches.
Abstract
We consider the problem of learning an -optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Given access to a generative model, we achieve rate-optimal sample complexity by performing a simple, \emph{perturbed} version of least-squares value iteration with orthogonal trigonometric polynomials as features. Key to our solution is a novel projection technique based on ideas from harmonic analysis. Our~ sample complexity, where is the dimension of the state-action space and the order of smoothness, recovers the state-of-the-art result of discretization approaches for the special case of Lipschitz MDPs . At the same time, for , it recovers and greatly generalizes the rate of low-rank MDPs, which are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Neural Networks and Applications · VLSI and FPGA Design Techniques
