Final Iteration Convergence Bound of Q-Learning: Switching System Approach
Donghwna Lee

TL;DR
This paper establishes a finite-time convergence bound for the final iterate of Q-learning using a switching system approach, addressing limitations of prior averaged-iterate bounds and offering new insights into RL algorithm analysis.
Contribution
It introduces a finite-time error bound for the final iterate of Q-learning based on a switching system framework, expanding analysis beyond averaged iterates.
Findings
Finite-time error bound for Q-learning's final iterate.
Analysis covers different scenarios compared to previous work.
Provides insights connecting Q-learning with discrete-time switching systems.
Abstract
Q-learning is known as one of the fundamental reinforcement learning (RL) algorithms. Its convergence has been the focus of extensive research over the past several decades. Recently, a new finitetime error bound and analysis for Q-learning was introduced using a switching system framework. This approach views the dynamics of Q-learning as a discrete-time stochastic switching system. The prior study established a finite-time error bound on the averaged iterates using Lyapunov functions, offering further insights into Q-learning. While valuable, the analysis focuses on error bounds of the averaged iterate, which comes with the inherent disadvantages: it necessitates extra averaging steps, which can decelerate the convergence rate. Moreover, the final iterate, being the original format of Q-learning, is more commonly used and is often regarded as a more intuitive and natural form in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and ELM · Fault Detection and Control Systems
MethodsQ-Learning
