Dynamic Programming Through the Lens of Semismooth Newton-Type Methods (Extended Version)
Matilde Gargiani, Andrea Zanelli, Dominic Liao-McPherson, Tyler, Summers, John Lygeros

TL;DR
This paper reveals that policy and value iteration methods in dynamic programming are special cases of semismooth Newton-type methods, providing new insights into their convergence properties and introducing an accelerated value iteration algorithm.
Contribution
It establishes the equivalence of policy iteration to semismooth Newton methods and develops a novel accelerated value iteration with convergence guarantees.
Findings
Policy iteration has local quadratic convergence.
Value iteration is a fixed-point iteration.
The accelerated value iteration improves convergence with minimal extra cost.
Abstract
Policy iteration and value iteration are at the core of many (approximate) dynamic programming methods. For Markov Decision Processes with finite state and action spaces, we show that they are instances of semismooth Newton-type methods to solve the Bellman equation. In particular, we prove that policy iteration is equivalent to the exact semismooth Newton method and enjoys local quadratic convergence rate. This finding is corroborated by extensive numerical evidence in the fields of control and operations research, which confirms that policy iteration generally requires few iterations to achieve convergence even when the number of policies is vast. We then show that value iteration is an instance of the fixed-point iteration method. In this spirit, we develop a novel locally accelerated version of value iteration with global convergence guarantees and negligible extra computational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adaptive Dynamic Programming Control
