A Switching System Theory of Q-Learning with Linear Function Approximation

Donghwan Lee; Han-Dong Lim

arXiv:2605.11021·cs.LG·May 20, 2026

A Switching System Theory of Q-Learning with Linear Function Approximation

Donghwan Lee, Han-Dong Lim

PDF

TL;DR

This paper introduces a novel switching-system framework for analyzing Q-learning with linear function approximation, linking convergence to the stability of a joint spectral radius-based model.

Contribution

It develops an exact linear switched model for Q-learning dynamics and relates convergence properties to switched system stability, providing a new analytical perspective.

Findings

01

Exact linear switched model for mean dynamics of Q-learning

02

JSR-based certificates can be less conservative than norm bounds

03

Framework connects Bellman equations, stochastic policy switching, and stability

Abstract

This paper develops a switching-system interpretation of Q-learning with linear function approximation (LFA) based on the joint spectral radius (JSR). We derive an exact linear switched model for the mean dynamics and relate convergence to stability of the corresponding switched system. The same construction is then used for stochastic linear Q-learning with independent and identically distributed (i.i.d.) observations and with Markovian observations. Although exact JSR computation is difficult in general, the certificate captures products of switching modes and can be less conservative than one-step norm bounds. The framework also yields a JSR-based view of regularized Q-learning with LFA. The resulting analysis connects projected Bellman equations, finite-difference stochastic-policy switching, and switched-system stability in a single parameter-space formulation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.