Optimistic Training and Convergence of Q-Learning -- Extended Version
Prashant Mehta, Sean Meyn

TL;DR
This paper investigates the stability and convergence of various Q-learning algorithms with function approximation, highlighting the necessity of additional structure and conditions for guaranteeing unique solutions and convergence.
Contribution
It extends previous stability results to other Q-learning variants and demonstrates the need for more structure to ensure convergence and solution uniqueness.
Findings
Multiple solutions to the projected Bellman equation can exist under certain policies.
Stability of Q-learning depends on the policy and the structure of the function approximation.
Additional conditions are necessary for convergence beyond standard tabular or linear MDP settings.
Abstract
In recent work it is shown that Q-learning with linear function approximation is stable, in the sense of bounded parameter estimates, under the -tamed Gibbs policy; is inverse temperature, and is introduced for additional exploration. Under these assumptions it also follows that there is a solution to the projected Bellman equation (PBE). Left open is uniqueness of the solution, and criteria for convergence outside of the standard tabular or linear MDP settings. The present work extends these results to other variants of Q-learning, and clarifies prior work: a one dimensional example shows that under an oblivious policy for training there may be no solution to the PBE, or multiple solutions, and in each case the algorithm is not stable under oblivious training. The main contribution is that far more structure is required for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Machine Learning and Algorithms
