Implicit Q-Learning and SARSA: Liberating Policy Control from Step-Size Calibration

Hwanwoo Kim; Eric Laber

arXiv:2601.18907·stat.ML·January 28, 2026

Implicit Q-Learning and SARSA: Liberating Policy Control from Step-Size Calibration

Hwanwoo Kim, Eric Laber

PDF

Open Access

TL;DR

This paper introduces implicit variants of Q-learning and SARSA that adaptively adjust step-sizes, reducing the need for manual tuning and enhancing stability across diverse reinforcement learning environments.

Contribution

The paper presents implicit formulations of Q-learning and SARSA that enable automatic step-size regulation, broadening stability and performance without manual parameter calibration.

Findings

01

Implicit methods maintain stability over wider step-size ranges.

02

They perform well with arbitrarily large step-sizes under certain conditions.

03

Empirical results show reduced sensitivity to step-size tuning.

Abstract

Q-learning and SARSA are foundational reinforcement learning algorithms whose practical success depends critically on step-size calibration. Step-sizes that are too large can cause numerical instability, while step-sizes that are too small can lead to slow progress. We propose implicit variants of Q-learning and SARSA that reformulate their iterative updates as fixed-point equations. This yields an adaptive step-size adjustment that scales inversely with feature norms, providing automatic regularization without manual tuning. Our non-asymptotic analyses demonstrate that implicit methods maintain stability over significantly broader step-size ranges. Under favorable conditions, it permits arbitrarily large step-sizes while achieving comparable convergence rates. Empirical validation across benchmark environments spanning discrete and continuous state spaces shows that implicit Q-learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Model Reduction and Neural Networks