Momentum-based Accelerated Q-learning

Bowen Weng; Lin Zhao; Huaqing Xiong; Wei Zhang

arXiv:1910.11673·eess.SY·October 28, 2019·1 cites

Momentum-based Accelerated Q-learning

Bowen Weng, Lin Zhao, Huaqing Xiong, Wei Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a momentum-inspired acceleration scheme for Q-learning that improves convergence rates and performance in both discrete and continuous state-action spaces, validated through theoretical analysis and experiments.

Contribution

It proposes a novel acceleration method for Q-learning inspired by optimization momentum techniques, applicable to both finite and continuous spaces.

Findings

01

Accelerated Q-learning converges to the global optimum at a rate of O(1/√T).

02

The proposed method outperforms SpeedyQ in the FrozenLake game.

03

Accelerated algorithms improve convergence in LQR and Atari 2600 tasks.

Abstract

This paper studies accelerated algorithms for Q-learning. We propose an acceleration scheme by incorporating the historical iterates of the Q-function. The idea is conceptually inspired by the momentum-based acceleration methods in the optimization theory. Under finite state-action space settings, the proposed accelerated Q-learning algorithm provably converges to the global optimum with a rate of $O (1/ T)$ . While sharing a comparable theoretic convergence rate with the existing Speedy Q-learning (SpeedyQ) algorithm, we numerically show that the proposed algorithm outperforms SpeedyQ via playing the FrozenLake grid world game. Furthermore, we generalize the acceleration scheme to the continuous state-action space case where function approximation of the Q-function is necessary. In this case, the algorithms are validated using commonly adopted testing problems in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fapont/hackaton-hiparis-2021
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research