The Efficacy of Pessimism in Asynchronous Q-Learning
Yuling Yan, Gen Li, Yuxin Chen, Jianqing Fan

TL;DR
This paper introduces a pessimism-based framework for asynchronous Q-learning that improves sample efficiency and adaptivity, especially with partial data coverage, and provides the first theoretical validation for pessimism in Markovian non-i.i.d. data.
Contribution
It develops a novel pessimism-incorporating algorithmic framework for asynchronous Q-learning with theoretical guarantees under partial data coverage.
Findings
Achieves near-optimal sample complexity with variance reduction.
Supports partial state-action space coverage unlike prior theories.
Provides the first theoretical validation of pessimism in Markovian non-i.i.d. data.
Abstract
This paper is concerned with the asynchronous form of Q-learning, which applies a stochastic approximation scheme to Markovian data samples. Motivated by the recent advances in offline reinforcement learning, we develop an algorithmic framework that incorporates the principle of pessimism into asynchronous Q-learning, which penalizes infrequently-visited state-action pairs based on suitable lower confidence bounds (LCBs). This framework leads to, among other things, improved sample efficiency and enhanced adaptivity in the presence of near-expert data. Our approach permits the observed data in some important scenarios to cover only partial state-action space, which is in stark contrast to prior theory that requires uniform coverage of all state-action pairs. When coupled with the idea of variance reduction, asynchronous Q-learning with LCB penalization achieves near-optimal sample…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsQ-Learning
