Implicit Q-Learning and SARSA: Liberating Policy Control from Step-Size Calibration
Hwanwoo Kim, Eric Laber

TL;DR
This paper introduces implicit variants of Q-learning and SARSA that adaptively adjust step-sizes, reducing the need for manual tuning and enhancing stability across diverse reinforcement learning environments.
Contribution
The paper presents implicit formulations of Q-learning and SARSA that enable automatic step-size regulation, broadening stability and performance without manual parameter calibration.
Findings
Implicit methods maintain stability over wider step-size ranges.
They perform well with arbitrarily large step-sizes under certain conditions.
Empirical results show reduced sensitivity to step-size tuning.
Abstract
Q-learning and SARSA are foundational reinforcement learning algorithms whose practical success depends critically on step-size calibration. Step-sizes that are too large can cause numerical instability, while step-sizes that are too small can lead to slow progress. We propose implicit variants of Q-learning and SARSA that reformulate their iterative updates as fixed-point equations. This yields an adaptive step-size adjustment that scales inversely with feature norms, providing automatic regularization without manual tuning. Our non-asymptotic analyses demonstrate that implicit methods maintain stability over significantly broader step-size ranges. Under favorable conditions, it permits arbitrarily large step-sizes while achieving comparable convergence rates. Empirical validation across benchmark environments spanning discrete and continuous state spaces shows that implicit Q-learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Model Reduction and Neural Networks
