A First-Order Approach To Accelerated Value Iteration
Vineet Goyal, Julien Grand-Clement

TL;DR
This paper introduces a novel accelerated value iteration algorithm for Markov decision processes that leverages optimization techniques like momentum, achieving faster convergence especially as the discount factor nears 1, with both theoretical guarantees and empirical improvements.
Contribution
It establishes a connection between value iteration and gradient descent, applying acceleration methods to improve convergence rates for MDPs near the discount factor limit.
Findings
Convergence rate improved to O(1/√(1-λ)) for reversible MDPs.
Proposed S-AVI algorithm is worst-case optimal and empirically faster.
Significant speedups up to tenfold in large MDP instances.
Abstract
Markov decision processes (MDPs) are used to model stochastic systems in many applications. Several efficient algorithms to compute optimal policies have been studied in the literature, including value iteration (VI) and policy iteration. However, these do not scale well especially when the discount factor for the infinite horizon discounted reward, , gets close to . In particular, the running time scales as for these algorithms. In this paper, our goal is to design new algorithms that scale better than previous approaches when approaches . Our main contribution is to present a connection between VI and gradient descent and adapt the ideas of acceleration and momentum in convex optimization to design faster algorithms for MDPs. We prove theoretical guarantees of a faster convergence of our algorithms for the computation of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAge of Information Optimization · Stochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics
