Multi-Bellman operator for convergence of $Q$-learning with linear   function approximation

Diogo S. Carvalho; Pedro A. Santos; Francisco S. Melo

arXiv:2309.16819·cs.LG·October 2, 2023

Multi-Bellman operator for convergence of $Q$-learning with linear function approximation

Diogo S. Carvalho, Pedro A. Santos, Francisco S. Melo

PDF

Open Access

TL;DR

This paper introduces a multi-Bellman operator for $Q$-learning with linear function approximation, providing new convergence guarantees and an algorithm that converges to a fixed point with improved accuracy.

Contribution

It proposes a novel multi-Bellman operator and a corresponding $Q$-learning algorithm with proven convergence properties under certain conditions.

Findings

01

The multi-Bellman operator extends traditional Bellman operator.

02

The projected multi-Bellman operator can be contractive.

03

The proposed algorithm converges to the fixed point with arbitrary accuracy.

Abstract

We study the convergence of $Q$ -learning with linear function approximation. Our key contribution is the introduction of a novel multi-Bellman operator that extends the traditional Bellman operator. By exploring the properties of this operator, we identify conditions under which the projected multi-Bellman operator becomes contractive, providing improved fixed-point guarantees compared to the Bellman operator. To leverage these insights, we propose the multi $Q$ -learning algorithm with linear function approximation. We demonstrate that this algorithm converges to the fixed-point of the projected multi-Bellman operator, yielding solutions of arbitrary accuracy. Finally, we validate our approach by applying it to well-known environments, showcasing the effectiveness and applicability of our findings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed Sensor Networks and Detection Algorithms · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms