Optimistic Training and Convergence of Q-Learning -- Extended Version

Prashant Mehta; Sean Meyn

arXiv:2602.06146·cs.LG·February 9, 2026

Optimistic Training and Convergence of Q-Learning -- Extended Version

Prashant Mehta, Sean Meyn

PDF

Open Access

TL;DR

This paper investigates the stability and convergence of various Q-learning algorithms with function approximation, highlighting the necessity of additional structure and conditions for guaranteeing unique solutions and convergence.

Contribution

It extends previous stability results to other Q-learning variants and demonstrates the need for more structure to ensure convergence and solution uniqueness.

Findings

01

Multiple solutions to the projected Bellman equation can exist under certain policies.

02

Stability of Q-learning depends on the policy and the structure of the function approximation.

03

Additional conditions are necessary for convergence beyond standard tabular or linear MDP settings.

Abstract

In recent work it is shown that Q-learning with linear function approximation is stable, in the sense of bounded parameter estimates, under the $(ε, κ)$ -tamed Gibbs policy; $κ$ is inverse temperature, and $ε > 0$ is introduced for additional exploration. Under these assumptions it also follows that there is a solution to the projected Bellman equation (PBE). Left open is uniqueness of the solution, and criteria for convergence outside of the standard tabular or linear MDP settings. The present work extends these results to other variants of Q-learning, and clarifies prior work: a one dimensional example shows that under an oblivious policy for training there may be no solution to the PBE, or multiple solutions, and in each case the algorithm is not stable under oblivious training. The main contribution is that far more structure is required for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Machine Learning and Algorithms