On the Global Convergence of Fitted Q-Iteration with Two-layer Neural   Network Parametrization

Mudit Gaur; Vaneet Aggarwal; Mridul Agarwal

arXiv:2211.07675·cs.LG·February 1, 2023

On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization

Mudit Gaur, Vaneet Aggarwal, Mridul Agarwal

PDF

Open Access 1 Video

TL;DR

This paper provides theoretical guarantees for a Fitted Q-Iteration algorithm using two-layer neural networks, demonstrating it achieves near-optimal sample complexity without restrictive assumptions.

Contribution

It establishes the first sample complexity bounds for neural network-based Fitted Q-Iteration in general MDPs without linearity assumptions.

Findings

01

Achieves $ ilde{O}(1/\epsilon^2)$ sample complexity

02

Works for countable state spaces without structural assumptions

03

Provides convergence guarantees for neural network parametrization

Abstract

Deep Q-learning based algorithms have been applied successfully in many decision making problems, while their theoretical foundations are not as well understood. In this paper, we study a Fitted Q-Iteration with two-layer ReLU neural network parameterization, and find the sample complexity guarantees for the algorithm. Our approach estimates the Q-function in each iteration using a convex optimization problem. We show that this approach achieves a sample complexity of $\tilde{O} (1/ ϵ^{2})$ , which is order-optimal. This result holds for a countable state-spaces and does not require any assumptions such as a linear or low rank structure on the MDP.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Model Reduction and Neural Networks

MethodsQ-Learning