Near Sample-Optimal Reduction-based Policy Learning for Average Reward   MDP

Jinghan Wang; Mengdi Wang; Lin F. Yang

arXiv:2212.00603·cs.LG·December 2, 2022·1 cites

Near Sample-Optimal Reduction-based Policy Learning for Average Reward MDP

Jinghan Wang, Mengdi Wang, Lin F. Yang

PDF

Open Access

TL;DR

This paper establishes near sample-optimal bounds for policy learning in average reward MDPs using a reduction to discounted MDPs, improving previous mixing-time-based results and matching lower bounds.

Contribution

It introduces a reduction from average reward MDPs to discounted MDPs, enabling new sample complexity bounds that are nearly optimal and improve upon prior mixing-time assumptions.

Findings

01

Upper bound of (H \u00b7 ps^3 ) samples per state-action pair.

02

Lower bound of (|| || H ) total samples, matching the upper bound.

03

Reduction technique from AMDP to discounted MDPs enables application of DMDP algorithms.

Abstract

This work considers the sample complexity of obtaining an $ε$ -optimal policy in an average reward Markov Decision Process (AMDP), given access to a generative model (simulator). When the ground-truth MDP is weakly communicating, we prove an upper bound of $O (H ε^{- 3} ln \frac{1}{δ})$ samples per state-action pair, where $H := s p (h^{*})$ is the span of bias of any optimal policy, $ε$ is the accuracy and $δ$ is the failure probability. This bound improves the best-known mixing-time-based approaches in [Jin & Sidford 2021], which assume the mixing-time of every deterministic policy is bounded. The core of our analysis is a proper reduction bound from AMDP problems to discounted MDP (DMDP) problems, which may be of independent interests since it allows the application of DMDP algorithms for AMDP in other settings. We complement our upper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms