Multi-Bellman operator for convergence of $Q$-learning with linear function approximation
Diogo S. Carvalho, Pedro A. Santos, Francisco S. Melo

TL;DR
This paper introduces a multi-Bellman operator for $Q$-learning with linear function approximation, providing new convergence guarantees and an algorithm that converges to a fixed point with improved accuracy.
Contribution
It proposes a novel multi-Bellman operator and a corresponding $Q$-learning algorithm with proven convergence properties under certain conditions.
Findings
The multi-Bellman operator extends traditional Bellman operator.
The projected multi-Bellman operator can be contractive.
The proposed algorithm converges to the fixed point with arbitrary accuracy.
Abstract
We study the convergence of -learning with linear function approximation. Our key contribution is the introduction of a novel multi-Bellman operator that extends the traditional Bellman operator. By exploring the properties of this operator, we identify conditions under which the projected multi-Bellman operator becomes contractive, providing improved fixed-point guarantees compared to the Bellman operator. To leverage these insights, we propose the multi -learning algorithm with linear function approximation. We demonstrate that this algorithm converges to the fixed-point of the projected multi-Bellman operator, yielding solutions of arbitrary accuracy. Finally, we validate our approach by applying it to well-known environments, showcasing the effectiveness and applicability of our findings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed Sensor Networks and Detection Algorithms · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms
