Lyapunov-Certified Direct Switching Theory for Q-Learning

Donghwan Lee

arXiv:2604.19569·cs.LG·May 6, 2026

Lyapunov-Certified Direct Switching Theory for Q-Learning

Donghwan Lee

PDF

TL;DR

This paper introduces a novel Lyapunov-based framework for analyzing the convergence of Q-learning by representing it as a stochastic switching system, providing tight exponential rate bounds via the joint spectral radius.

Contribution

It develops the first convergence-rate analysis of standard Q-learning using the joint spectral radius of a switching family, with finite-time bounds and Lyapunov certificates.

Findings

01

The JSR determines the exponential convergence rate of Q-learning.

02

A Lyapunov function based on the JSR provides finite-time bounds.

03

Quadratic Lyapunov certificates offer a simpler verification method when feasible.

Abstract

Q-learning is a fundamental algorithmic primitive in reinforcement learning. This paper develops a new framework for analyzing Q-learning from a switching-system viewpoint. In particular, we derive a direct stochastic switching-system representation of the Q-learning error. The key observation is that the Bellman maximization error can be expressed exactly as an average of action-wise Q-errors under a suitable stochastic policy. The resulting recursion has a switched linear conditional-mean drift and martingale-difference noise. To the best of our knowledge, this is the first convergence-rate analysis of standard Q-learning whose leading exponential rate is expressed through the joint spectral radius (JSR) of a direct switching family. Since the JSR is the exact worst-case exponential rate of the associated switched linear drift, the resulting rate is among the tightest drift-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.