Efficient Global Planning in Large MDPs via Stochastic Primal-Dual   Optimization

Gergely Neu; Nneka Okolo

arXiv:2210.12057·cs.LG·February 1, 2023

Efficient Global Planning in Large MDPs via Stochastic Primal-Dual Optimization

Gergely Neu, Nneka Okolo

PDF

Open Access

TL;DR

This paper introduces a stochastic primal-dual optimization algorithm for large discounted MDPs that efficiently produces near-optimal, compact softmax policies with polynomial query complexity, avoiding costly local planning.

Contribution

It presents a novel stochastic primal-dual method that guarantees near-optimal policies in large MDPs with linear function approximation and convex combination feature representations.

Findings

01

Outputs near-optimal policies after polynomial queries

02

Produces compact softmax policies with low-dimensional parameters

03

Avoids expensive local planning during runtime

Abstract

We propose a new stochastic primal-dual optimization algorithm for planning in a large discounted Markov decision process with a generative model and linear function approximation. Assuming that the feature map approximately satisfies standard realizability and Bellman-closedness conditions and also that the feature vectors of all state-action pairs are representable as convex combinations of a small core set of state-action pairs, we show that our method outputs a near-optimal policy after a polynomial number of queries to the generative model. Our method is computationally efficient and comes with the major advantage that it outputs a single softmax policy that is compactly represented by a low-dimensional parameter vector, and does not need to execute computationally expensive local planning subroutines in runtime.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning

MethodsSoftmax