Lyapunov-Certified Direct Switching Theory for Q-Learning
Donghwan Lee

TL;DR
This paper introduces a novel Lyapunov-based framework for analyzing the convergence of Q-learning by representing it as a stochastic switching system, providing tight exponential rate bounds via the joint spectral radius.
Contribution
It develops the first convergence-rate analysis of standard Q-learning using the joint spectral radius of a switching family, with finite-time bounds and Lyapunov certificates.
Findings
The JSR determines the exponential convergence rate of Q-learning.
A Lyapunov function based on the JSR provides finite-time bounds.
Quadratic Lyapunov certificates offer a simpler verification method when feasible.
Abstract
Q-learning is a fundamental algorithmic primitive in reinforcement learning. This paper develops a new framework for analyzing Q-learning from a switching-system viewpoint. In particular, we derive a direct stochastic switching-system representation of the Q-learning error. The key observation is that the Bellman maximization error can be expressed exactly as an average of action-wise Q-errors under a suitable stochastic policy. The resulting recursion has a switched linear conditional-mean drift and martingale-difference noise. To the best of our knowledge, this is the first convergence-rate analysis of standard Q-learning whose leading exponential rate is expressed through the joint spectral radius (JSR) of a direct switching family. Since the JSR is the exact worst-case exponential rate of the associated switched linear drift, the resulting rate is among the tightest drift-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
