The Efficacy of Pessimism in Asynchronous Q-Learning

Yuling Yan; Gen Li; Yuxin Chen; Jianqing Fan

arXiv:2203.07368·cs.LG·March 15, 2022

The Efficacy of Pessimism in Asynchronous Q-Learning

Yuling Yan, Gen Li, Yuxin Chen, Jianqing Fan

PDF

TL;DR

This paper introduces a pessimism-based framework for asynchronous Q-learning that improves sample efficiency and adaptivity, especially with partial data coverage, and provides the first theoretical validation for pessimism in Markovian non-i.i.d. data.

Contribution

It develops a novel pessimism-incorporating algorithmic framework for asynchronous Q-learning with theoretical guarantees under partial data coverage.

Findings

01

Achieves near-optimal sample complexity with variance reduction.

02

Supports partial state-action space coverage unlike prior theories.

03

Provides the first theoretical validation of pessimism in Markovian non-i.i.d. data.

Abstract

This paper is concerned with the asynchronous form of Q-learning, which applies a stochastic approximation scheme to Markovian data samples. Motivated by the recent advances in offline reinforcement learning, we develop an algorithmic framework that incorporates the principle of pessimism into asynchronous Q-learning, which penalizes infrequently-visited state-action pairs based on suitable lower confidence bounds (LCBs). This framework leads to, among other things, improved sample efficiency and enhanced adaptivity in the presence of near-expert data. Our approach permits the observed data in some important scenarios to cover only partial state-action space, which is in stark contrast to prior theory that requires uniform coverage of all state-action pairs. When coupled with the idea of variance reduction, asynchronous Q-learning with LCB penalization achieves near-optimal sample…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsQ-Learning