A Measure-Theoretic Finite-Sample Theory for Adaptive-Data Fitted Q-Iteration
Manuel Haussmann, Mustafa Mert \c{C}elikok, Melih Kandemir

TL;DR
This paper develops a measure-theoretic finite-sample framework for fitted Q-iteration in reinforcement learning, addressing theoretical gaps and providing performance bounds in continuous spaces.
Contribution
It introduces a unified measure-theoretic approach to analyze FQI with finite-sample guarantees and online regret bounds in general measurable spaces.
Findings
Finite-sample performance bounds for FQI on general spaces.
Sequential Rademacher complexity controls Bellman-regression generalization.
First cumulative online regret guarantee for FQI in continuous spaces.
Abstract
While reinforcement learning (RL) promises to revolutionize the control of complex nonlinear robotic systems, a profound gap persists between the heuristic success of model-free off-policy deep RL and the underlying theory, which remains largely confined to tabular or linearizable settings. We identify the cause of this gap as an emergent isolation of three traditions: (i) measure-theoretic MDP foundations on general spaces limit their analysis to exact dynamic programming and ignore all error sources of a learning process; (ii) deterministic error propagation analysis addresses the approximation error via concentrability coefficients without a finite-sample analysis of the estimation error; and (iii) PAC generalization bounds characterize the estimation errors of simplified topologies. We bridge these traditions with a unified theoretical framework for fitted Q-iteration (FQI) on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
