Sample-Optimal Parametric Q-Learning Using Linearly Additive Features

Lin F. Yang; Mengdi Wang

arXiv:1902.04779·cs.LG·June 7, 2019·32 cites

Sample-Optimal Parametric Q-Learning Using Linearly Additive Features

Lin F. Yang, Mengdi Wang

PDF

Open Access

TL;DR

This paper introduces a sample-efficient parametric Q-learning algorithm for large-scale MDPs with linear features, achieving near-optimal sample complexity by leveraging monotonicity and noise structure.

Contribution

It proposes a novel parametric Q-learning method with provable sample optimality that scales with feature dimension, independent of state space size, and incorporates variance reduction techniques.

Findings

01

Achieves $ ilde{O}(K/\epsilon^2(1-\gamma)^3)$ sample complexity

02

Proves a matching information-theoretical lower bound

03

Demonstrates effectiveness in large-scale MDPs with linear features

Abstract

Consider a Markov decision process (MDP) that admits a set of state-action features, which can linearly express the process's probabilistic transition model. We propose a parametric Q-learning algorithm that finds an approximate-optimal policy using a sample size proportional to the feature dimension $K$ and invariant with respect to the size of the state space. To further improve its sample efficiency, we exploit the monotonicity property and intrinsic noise structure of the Bellman operator, provided the existence of anchor state-actions that imply implicit non-negativity in the feature space. We augment the algorithm using techniques of variance reduction, monotonicity preservation, and confidence bounds. It is proved to find a policy which is $ϵ$ -optimal from any initial state with high probability using $O (K / ϵ^{2} (1 - γ)^{3})$ sample transitions for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Fault Detection and Control Systems · Control Systems and Identification

MethodsQ-Learning