A Low-rank Approximation for MDPs via Moment Coupling
Amy B.Z. Zhang, Itai Gurvich

TL;DR
This paper proposes a novel approximation method for Markov Decision Processes that combines state aggregation with moment matching, leading to significant computational reductions while maintaining optimality guarantees.
Contribution
It introduces a moment coupling framework that approximates MDPs without solving PDEs, enabling efficient state space reduction with theoretical guarantees.
Findings
Reduces state space from N to approximately N^{0.5+ε}.
Provides a disciplined mechanism for tuning aggregation probabilities.
Achieves computational gains with maintained optimality guarantees.
Abstract
We introduce a framework to approximate a Markov Decision Process that stands on two pillars: state aggregation -- as the algorithmic infrastructure; and central-limit-theorem-type approximations -- as the mathematical underpinning of optimality guarantees. The theory is grounded in recent work Braverman et al (2020} that relates the solution of the Bellman equation to that of a PDE where, in the spirit of the central limit theorem, the transition matrix is reduced to its local first and second moments. Solving the PDE is required by our method. Instead, we construct a "sister" (controlled) Markov chain whose two local transition moments are approximately identical with those of the focal chain. Because of this , the original chain and its "sister" are coupled through the PDE, a coupling that facilitates optimality guarantees. Embedded into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Advanced Bandit Algorithms Research
