On Sample Complexity of Projection-Free Primal-Dual Methods for Learning   Mixture Policies in Markov Decision Processes

Masoud Badiei Khuzani; Varun Vasudevan; Hongyi Ren; Lei Xing

arXiv:1903.06727·cs.LG·September 2, 2019·1 cites

On Sample Complexity of Projection-Free Primal-Dual Methods for Learning Mixture Policies in Markov Decision Processes

Masoud Badiei Khuzani, Varun Vasudevan, Hongyi Ren, Lei Xing

PDF

Open Access

TL;DR

This paper introduces a projection-free stochastic primal-dual method for learning mixture policies in large-state MDPs, providing PAC sample complexity analysis and demonstrating improved efficiency over penalty methods.

Contribution

It develops a novel projection-free primal-dual algorithm for ALP in MDPs and analyzes its PAC sample complexity, with practical modifications and empirical validation.

Findings

01

The proposed algorithm achieves near-optimal policy performance.

02

It requires fewer samples compared to penalty methods.

03

Numerical results show lower variance and higher efficiency.

Abstract

We study the problem of learning policy of an infinite-horizon, discounted cost, Markov decision process (MDP) with a large number of states. We compute the actions of a policy that is nearly as good as a policy chosen by a suitable oracle from a given mixture policy class characterized by the convex hull of a set of known base policies. To learn the coefficients of the mixture model, we recast the problem as an approximate linear programming (ALP) formulation for MDPs, where the feature vectors correspond to the occupation measures of the base policies defined on the state-action space. We then propose a projection-free stochastic primal-dual method with the Bregman divergence to solve the characterized ALP. Furthermore, we analyze the probably approximately correct (PAC) sample complexity of the proposed stochastic algorithm, namely the number of queries required to achieve near…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsControl Systems and Identification · Advanced Control Systems Optimization · Fuzzy Systems and Optimization