An Accelerated Fitted Value Iteration Algorithm for MDPs with Finite and   Vector-Valued Action Space

Sixiang Zhao; William B. Haskell; Michel-Alexandre Cardin

arXiv:1901.05154·math.OC·November 30, 2020·1 cites

An Accelerated Fitted Value Iteration Algorithm for MDPs with Finite and Vector-Valued Action Space

Sixiang Zhao, William B. Haskell, Michel-Alexandre Cardin

PDF

Open Access

TL;DR

This paper introduces an accelerated fitted value iteration algorithm for high-dimensional MDPs with vector-valued action spaces, utilizing neural network approximations and a specialized decomposition method to improve computational efficiency.

Contribution

The paper develops a novel accelerated FVI algorithm that combines neural network approximation with a multi-cut decomposition approach for efficient action selection in complex MDPs.

Findings

01

Significant speed-up in FVI without losing much accuracy

02

Effective neural network approximation for value functions

03

Proven convergence and optimality of the proposed method

Abstract

This paper studies an accelerated fitted value iteration (FVI) algorithm to solve high-dimensional Markov decision processes (MDPs). FVI is an approximate dynamic programming algorithm that has desirable theoretical properties. However, it can be intractable when the action space is finite but vector-valued. To solve such MDPs via FVI, we first approximate the value functions by a two-layer neural network (NN) with rectified linear units (ReLU) being activation functions. We then verify that such approximators are strong enough for the MDP. To speed up the FVI, we recast the action selection problem as a two-stage stochastic programming problem, where the resulting recourse function comes from the two-layer NN. Then, the action selection problem is solved with a specialized multi-cut decomposition algorithm. More specifically, we design valid cuts by exploiting the structure of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Reinforcement Learning in Robotics · Optimization and Variational Analysis