Fast Approximate Dynamic Programming for Infinite-Horizon Markov   Decision Processes

M. A. S. Kolarijani; G. F. Max; P. Mohajerin Esfahani

arXiv:2102.08880·math.OC·March 18, 2022

Fast Approximate Dynamic Programming for Infinite-Horizon Markov Decision Processes

M. A. S. Kolarijani, G. F. Max, P. Mohajerin Esfahani

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel numerical scheme for infinite-horizon stochastic control problems that significantly reduces computational complexity by operating in the conjugate domain, enabling faster value iteration.

Contribution

It proposes a new approach using the Legendre transform to implement value iteration more efficiently for nonlinear stochastic systems with constraints.

Findings

01

Reduces per-iteration complexity from O(XU) to O(X+U)

02

Provides convergence, time complexity, and error analysis

03

Applicable to nonlinear stochastic control with constraints

Abstract

In this study, we consider the infinite-horizon, discounted cost, optimal control of stochastic nonlinear systems with separable cost and constraints in the state and input variables. Using the linear-time Legendre transform, we propose a novel numerical scheme for implementation of the corresponding value iteration (VI) algorithm in the conjugate domain. Detailed analyses of the convergence, time complexity, and error of the proposed algorithm are provided. In particular, with a discretization of size $X$ and $U$ for the state and input spaces, respectively, the proposed approach reduces the time complexity of each iteration in the VI algorithm from $O (X U)$ to $O (X + U)$ , by replacing the minimization operation in the primal domain with a simple addition in the conjugate domain.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AminKolarijani/ConjVI
noneOfficial

Videos

Fast Approximate Dynamic Programming for Infinite-Horizon Markov Decision Processes· slideslive

Taxonomy

TopicsAdvanced Control Systems Optimization · Reinforcement Learning in Robotics · Risk and Portfolio Optimization