Double Successive Over-Relaxation Q-Learning with an Extension to Deep Reinforcement Learning

Shreyas S R

arXiv:2409.06356·cs.LG·July 1, 2025

Double Successive Over-Relaxation Q-Learning with an Extension to Deep Reinforcement Learning

Shreyas S R

PDF

Open Access 1 Repo

TL;DR

This paper introduces a double SOR Q-learning algorithm that is model-free and less biased, extending it to deep reinforcement learning and demonstrating improved convergence and performance in various environments.

Contribution

It proposes a novel sample-based, model-free double SOR Q-learning algorithm that reduces bias and extends to large-scale deep RL applications.

Findings

01

Less biased than traditional SOR Q-learning

02

Effective in large-scale deep RL environments

03

Improved convergence in tabular and deep settings

Abstract

Q-learning is a widely used algorithm in reinforcement learning (RL), but its convergence can be slow, especially when the discount factor is close to one. Successive Over-Relaxation (SOR) Q-learning, which introduces a relaxation factor to speed up convergence, addresses this issue but has two major limitations: In the tabular setting, the relaxation parameter depends on transition probability, making it not entirely model-free, and it suffers from overestimation bias. To overcome these limitations, we propose a sample-based, model-free double SOR Q-learning algorithm. Theoretically and empirically, this algorithm is shown to be less biased than SOR Q-learning. Further, in the tabular setting, the convergence analysis under boundedness assumptions on iterates is discussed. The proposed algorithm is extended to large-scale problems using deep RL. Finally, the tabular version of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shreyassr123/double-sor-q-learning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM

MethodsQ-Learning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings