# Convergence of regularized agent-state-based Q-learning in POMDPs

**Authors:** Amit Sinha, Matthieu Geist, Aditya Mahajan

arXiv: 2508.21314 · 2025-09-04

## TL;DR

This paper analyzes the convergence of a class of Q-learning algorithms in POMDPs that use agent states and regularization, providing theoretical guarantees and empirical validation.

## Contribution

It introduces RASQL, a framework for understanding convergence of regularized agent-state-based Q-learning in POMDPs, including variants for periodic policies.

## Key findings

- RASQL converges to a fixed point of a regularized MDP.
- Convergence holds under mild technical conditions.
- Empirical results match theoretical predictions.

## Abstract

In this paper, we present a framework to understand the convergence of commonly used Q-learning reinforcement learning algorithms in practice. Two salient features of such algorithms are: (i)~the Q-table is recursively updated using an agent state (such as the state of a recurrent neural network) which is not a belief state or an information state and (ii)~policy regularization is often used to encourage exploration and stabilize the learning algorithm. We investigate the simplest form of such Q-learning algorithms which we call regularized agent-state-based Q-learning (RASQL) and show that it converges under mild technical conditions to the fixed point of an appropriately defined regularized MDP, which depends on the stationary distribution induced by the behavioral policy. We also show that a similar analysis continues to work for a variant of RASQL that learns periodic policies. We present numerical examples to illustrate that the empirical convergence behavior matches with the proposed theoretical limit.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21314/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21314/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/2508.21314/full.md

---
Source: https://tomesphere.com/paper/2508.21314