Primal-Dual $\pi$ Learning: Sample Complexity and Sublinear Run Time for   Ergodic Markov Decision Problems

Mengdi Wang

arXiv:1710.06100·cs.LG·October 18, 2017·35 cites

Primal-Dual $\pi$ Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems

Mengdi Wang

PDF

Open Access

TL;DR

This paper introduces a primal-dual $c0b9b3b1bbb5b9bdb3b1b9c2b7bcb5bdb7c2, an efficient model-free method for approximating optimal policies in ergodic MDPs with improved sample complexity and sublinear runtime.

Contribution

The paper proposes a novel primal-dual c0b9b3b1bbb5b9bdb3b1b94a94a94a9 learning algorithm that achieves near-optimal sample complexity and sublinear runtime for ergodic MDPs, extending applicability to explicit transition models.

Findings

01

Achieves 4b5(4b5 ( au d7 t_{mix}^*)^2 |S| |A| / 4b5^2) sample complexity for 4b5-approximate policies.

02

Provides a sublinear-time algorithm for solving average-reward MDPs with efficient sampling.

03

Applicable to both model-free and explicit transition models.

Abstract

Consider the problem of approximating the optimal policy of a Markov decision process (MDP) by sampling state transitions. In contrast to existing reinforcement learning methods that are based on successive approximations to the nonlinear Bellman equation, we propose a Primal-Dual $π$ Learning method in light of the linear duality between the value and policy. The $π$ learning method is model-free and makes primal-dual updates to the policy and value vectors as new data are revealed. For infinite-horizon undiscounted Markov decision process with finite state space $S$ and finite action space $A$ , the $π$ learning method finds an $ϵ$ -optimal policy using the following number of sample transitions $\tilde{O} (\frac{( τ \cdot t _{mi x}^{*} ) ^{2} ∣ S ∣∣ A ∣}{ϵ ^{2}}),$ where $t_{mi x}^{*}$ is an upper bound of mixing times across all policies and $τ$ is a parameter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Simulation Techniques and Applications · Formal Methods in Verification