Toward a Unified Lyapunov-Certified ODE Convergence Analysis of Smooth Q-Learning with p-Norms
Donghwan Lee, Hyunjun Na

TL;DR
This paper introduces a unified ODE-based convergence analysis framework for smooth Q-learning algorithms using p-norm Lyapunov functions, addressing non-smoothness and contraction issues.
Contribution
It develops a broad, unified ODE-based stability framework applicable to various smooth Q-learning variants, including non-contractive cases.
Findings
The framework applies to standard and smoothed Q-learning algorithms.
It uses a smooth p-norm Lyapunov function for concise stability proofs.
The analysis covers cases where the Bellman operator is not a contraction.
Abstract
Convergence of Q-learning has been the subject of extensive study for decades. Among the available techniques, the ordinary differential equation (ODE) method is particularly appealing as a general-purpose, off-the-shelf tool for sanity-checking the convergence of a wide range of reinforcement learning algorithms. In this paper, we develop a unified ODE-based convergence framework that applies to standard Q-learning and several soft/smoothed variants, including those built on the log-sum-exponential softmax, Boltzmann softmax, and mellowmax operators. Our analysis uses a smooth p-norm Lyapunov function, leading to concise yet rigorous stability arguments and circumventing the non-smoothness issues inherent to classical infty-norm-based approaches. To the best of our knowledge, the proposed framework is among the first to provide a unified ODE-based treatment that is broadly applicable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
