On Sample Complexity of Projection-Free Primal-Dual Methods for Learning Mixture Policies in Markov Decision Processes
Masoud Badiei Khuzani, Varun Vasudevan, Hongyi Ren, Lei Xing

TL;DR
This paper introduces a projection-free stochastic primal-dual method for learning mixture policies in large-state MDPs, providing PAC sample complexity analysis and demonstrating improved efficiency over penalty methods.
Contribution
It develops a novel projection-free primal-dual algorithm for ALP in MDPs and analyzes its PAC sample complexity, with practical modifications and empirical validation.
Findings
The proposed algorithm achieves near-optimal policy performance.
It requires fewer samples compared to penalty methods.
Numerical results show lower variance and higher efficiency.
Abstract
We study the problem of learning policy of an infinite-horizon, discounted cost, Markov decision process (MDP) with a large number of states. We compute the actions of a policy that is nearly as good as a policy chosen by a suitable oracle from a given mixture policy class characterized by the convex hull of a set of known base policies. To learn the coefficients of the mixture model, we recast the problem as an approximate linear programming (ALP) formulation for MDPs, where the feature vectors correspond to the occupation measures of the base policies defined on the state-action space. We then propose a projection-free stochastic primal-dual method with the Bregman divergence to solve the characterized ALP. Furthermore, we analyze the probably approximately correct (PAC) sample complexity of the proposed stochastic algorithm, namely the number of queries required to achieve near…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsControl Systems and Identification · Advanced Control Systems Optimization · Fuzzy Systems and Optimization
