Double Successive Over-Relaxation Q-Learning with an Extension to Deep Reinforcement Learning
Shreyas S R

TL;DR
This paper introduces a double SOR Q-learning algorithm that is model-free and less biased, extending it to deep reinforcement learning and demonstrating improved convergence and performance in various environments.
Contribution
It proposes a novel sample-based, model-free double SOR Q-learning algorithm that reduces bias and extends to large-scale deep RL applications.
Findings
Less biased than traditional SOR Q-learning
Effective in large-scale deep RL environments
Improved convergence in tabular and deep settings
Abstract
Q-learning is a widely used algorithm in reinforcement learning (RL), but its convergence can be slow, especially when the discount factor is close to one. Successive Over-Relaxation (SOR) Q-learning, which introduces a relaxation factor to speed up convergence, addresses this issue but has two major limitations: In the tabular setting, the relaxation parameter depends on transition probability, making it not entirely model-free, and it suffers from overestimation bias. To overcome these limitations, we propose a sample-based, model-free double SOR Q-learning algorithm. Theoretically and empirically, this algorithm is shown to be less biased than SOR Q-learning. Further, in the tabular setting, the convergence analysis under boundedness assumptions on iterates is discussed. The proposed algorithm is extended to large-scale problems using deep RL. Finally, the tabular version of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM
MethodsQ-Learning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
