Primal-Dual $\pi$ Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems
Mengdi Wang

TL;DR
This paper introduces a primal-dual $c0b9b3b1bbb5b9bdb3b1b9c2b7bcb5bdb7c2, an efficient model-free method for approximating optimal policies in ergodic MDPs with improved sample complexity and sublinear runtime.
Contribution
The paper proposes a novel primal-dual c0b9b3b1bbb5b9bdb3b1b94a94a94a9 learning algorithm that achieves near-optimal sample complexity and sublinear runtime for ergodic MDPs, extending applicability to explicit transition models.
Findings
Achieves 4b5(4b5 ( au d7 t_{mix}^*)^2 |S| |A| / 4b5^2) sample complexity for 4b5-approximate policies.
Provides a sublinear-time algorithm for solving average-reward MDPs with efficient sampling.
Applicable to both model-free and explicit transition models.
Abstract
Consider the problem of approximating the optimal policy of a Markov decision process (MDP) by sampling state transitions. In contrast to existing reinforcement learning methods that are based on successive approximations to the nonlinear Bellman equation, we propose a Primal-Dual Learning method in light of the linear duality between the value and policy. The learning method is model-free and makes primal-dual updates to the policy and value vectors as new data are revealed. For infinite-horizon undiscounted Markov decision process with finite state space and finite action space , the learning method finds an -optimal policy using the following number of sample transitions where is an upper bound of mixing times across all policies and is a parameter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Simulation Techniques and Applications · Formal Methods in Verification
