Periodic Regularized Q-Learning

Hyukjun Yang; Han-Dong Lim; Donghwan Lee

arXiv:2602.03301·cs.LG·February 4, 2026

Periodic Regularized Q-Learning

Hyukjun Yang, Han-Dong Lim, Donghwan Lee

PDF

Open Access

TL;DR

This paper introduces Periodic Regularized Q-Learning (PRQ), a new RL algorithm with finite-time convergence guarantees under linear function approximation, achieved through a novel regularization of the projection operator.

Contribution

It proposes a regularized projected value iteration method and extends it to a stochastic setting, ensuring stable convergence in RL with function approximation.

Findings

01

PRQ converges in finite time under linear function approximation.

02

Regularization of the projection operator makes the projected value iteration a contraction.

03

Theoretical analysis confirms the stability and convergence of PRQ.

Abstract

In reinforcement learning (RL), Q-learning is a fundamental algorithm whose convergence is guaranteed in the tabular setting. However, this convergence guarantee does not hold under linear function approximation. To overcome this limitation, a significant line of research has introduced regularization techniques to ensure stable convergence under function approximation. In this work, we propose a new algorithm, periodic regularized Q-learning (PRQ). We first introduce regularization at the level of the projection operator and explicitly construct a regularized projected value iteration (RP-VI), subsequently extending it to a sample-based RL algorithm. By appropriately regularizing the projection operator, the resulting projected value iteration becomes a contraction. By extending this regularized projection into the stochastic setting, we establish the PRQ algorithm and provide a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Adaptive Dynamic Programming Control