A First-Order Approach To Accelerated Value Iteration

Vineet Goyal; Julien Grand-Clement

arXiv:1905.09963·math.OC·August 30, 2021·Oper. Res.·1 cites

A First-Order Approach To Accelerated Value Iteration

Vineet Goyal, Julien Grand-Clement

PDF

Open Access

TL;DR

This paper introduces a novel accelerated value iteration algorithm for Markov decision processes that leverages optimization techniques like momentum, achieving faster convergence especially as the discount factor nears 1, with both theoretical guarantees and empirical improvements.

Contribution

It establishes a connection between value iteration and gradient descent, applying acceleration methods to improve convergence rates for MDPs near the discount factor limit.

Findings

01

Convergence rate improved to O(1/√(1-λ)) for reversible MDPs.

02

Proposed S-AVI algorithm is worst-case optimal and empirically faster.

03

Significant speedups up to tenfold in large MDP instances.

Abstract

Markov decision processes (MDPs) are used to model stochastic systems in many applications. Several efficient algorithms to compute optimal policies have been studied in the literature, including value iteration (VI) and policy iteration. However, these do not scale well especially when the discount factor for the infinite horizon discounted reward, $λ$ , gets close to $1$ . In particular, the running time scales as $O (1/ (1 - λ))$ for these algorithms. In this paper, our goal is to design new algorithms that scale better than previous approaches when $λ$ approaches $1$ . Our main contribution is to present a connection between VI and gradient descent and adapt the ideas of acceleration and momentum in convex optimization to design faster algorithms for MDPs. We prove theoretical guarantees of a faster convergence of our algorithms for the computation of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAge of Information Optimization · Stochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics