Bridging the Gap between Newton-Raphson Method and Regularized Policy   Iteration

Zeyang Li; Chuxiong Hu; Yunan Wang; Guojian Zhan; Jie Li; Shengbo Eben; Li

arXiv:2310.07211·cs.LG·October 12, 2023

Bridging the Gap between Newton-Raphson Method and Regularized Policy Iteration

Zeyang Li, Chuxiong Hu, Yunan Wang, Guojian Zhan, Jie Li, Shengbo Eben, Li

PDF

Open Access

TL;DR

This paper establishes a theoretical link between regularized policy iteration in reinforcement learning and the Newton-Raphson method, providing insights into its convergence properties and introducing a modified version with finite-step evaluation.

Contribution

It proves the equivalence of regularized policy iteration to the Newton-Raphson method and analyzes its convergence behavior, including global linear and local quadratic convergence.

Findings

01

Regularized policy iteration is equivalent to Newton-Raphson under certain conditions.

02

The algorithm exhibits global linear convergence with rate γ.

03

A modified version with finite-step evaluation converges linearly at rate γ^M.

Abstract

Regularization is one of the most important techniques in reinforcement learning algorithms. The well-known soft actor-critic algorithm is a special case of regularized policy iteration where the regularizer is chosen as Shannon entropy. Despite some empirical success of regularized policy iteration, its theoretical underpinnings remain unclear. This paper proves that regularized policy iteration is strictly equivalent to the standard Newton-Raphson method in the condition of smoothing out Bellman equation with strongly convex functions. This equivalence lays the foundation of a unified analysis for both global and local convergence behaviors of regularized policy iteration. We prove that regularized policy iteration has global linear convergence with the rate being $γ$ (discount factor). Furthermore, this algorithm converges quadratically once it enters a local region around the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Optimization Algorithms Research · Stochastic Gradient Optimization Techniques