# Successive Over Relaxation Q-Learning

**Authors:** Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Shalabh Bhatnagar

arXiv: 1903.03812 · 2019-06-17

## TL;DR

This paper introduces SOR Q-learning, a novel reinforcement learning algorithm that accelerates convergence to optimal policies by incorporating successive over-relaxation techniques, demonstrating faster learning in experiments.

## Contribution

It extends traditional Q-learning with a new SOR-based approach, providing theoretical convergence guarantees and improved speed in finding optimal policies.

## Key findings

- SOR Q-learning converges almost surely.
- It outperforms standard Q-learning in speed.
- Theoretical analysis confirms convergence.

## Abstract

In a discounted reward Markov Decision Process (MDP), the objective is to find the optimal value function, i.e., the value function corresponding to an optimal policy. This problem reduces to solving a functional equation known as the Bellman equation and a fixed point iteration scheme known as the value iteration is utilized to obtain the solution. In literature, a successive over-relaxation based value iteration scheme is proposed to speed-up the computation of the optimal value function. The speed-up is achieved by constructing a modified Bellman equation that ensures faster convergence to the optimal value function. However, in many practical applications, the model information is not known and we resort to Reinforcement Learning (RL) algorithms to obtain optimal policy and value function. One such popular algorithm is Q-learning. In this paper, we propose Successive Over-Relaxation (SOR) Q-learning. We first derive a modified fixed point iteration for SOR Q-values and utilize stochastic approximation to derive a learning algorithm to compute the optimal value function and an optimal policy. We then prove the almost sure convergence of the SOR Q-learning to SOR Q-values. Finally, through numerical experiments, we show that SOR Q-learning is faster compared to the standard Q-learning algorithm.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.03812/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1903.03812/full.md

## References

15 references — full list in the complete paper: https://tomesphere.com/paper/1903.03812/full.md

---
Source: https://tomesphere.com/paper/1903.03812