A Low-rank Approximation for MDPs via Moment Coupling

Amy B.Z. Zhang; Itai Gurvich

arXiv:2009.08966·math.OC·April 13, 2021

A Low-rank Approximation for MDPs via Moment Coupling

Amy B.Z. Zhang, Itai Gurvich

PDF

Open Access

TL;DR

This paper proposes a novel approximation method for Markov Decision Processes that combines state aggregation with moment matching, leading to significant computational reductions while maintaining optimality guarantees.

Contribution

It introduces a moment coupling framework that approximates MDPs without solving PDEs, enabling efficient state space reduction with theoretical guarantees.

Findings

01

Reduces state space from N to approximately N^{0.5+ε}.

02

Provides a disciplined mechanism for tuning aggregation probabilities.

03

Achieves computational gains with maintained optimality guarantees.

Abstract

We introduce a framework to approximate a Markov Decision Process that stands on two pillars: state aggregation -- as the algorithmic infrastructure; and central-limit-theorem-type approximations -- as the mathematical underpinning of optimality guarantees. The theory is grounded in recent work Braverman et al (2020} that relates the solution of the Bellman equation to that of a PDE where, in the spirit of the central limit theorem, the transition matrix is reduced to its local first and second moments. Solving the PDE is $not$ required by our method. Instead, we construct a "sister" (controlled) Markov chain whose two local transition moments are approximately identical with those of the focal chain. Because of this $moment matching$ , the original chain and its "sister" are coupled through the PDE, a coupling that facilitates optimality guarantees. Embedded into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Advanced Bandit Algorithms Research