Robust Markov Decision Processes without Model Estimation
Wenhao Yang, Han Wang, Tadashi Kozuno, Scott M. Jordan, Zhihua Zhang

TL;DR
This paper introduces a model-free, sample-efficient approach to robust Markov Decision Processes that eliminates the need for large memory and an oracle, enabling practical application and theoretical guarantees.
Contribution
It transforms robust MDPs into an alternative form suitable for stochastic gradient methods, removing the need for an oracle and reducing memory requirements.
Findings
Proposes a new formulation for robust MDPs compatible with stochastic gradient methods.
Develops a sample-efficient, model-free algorithm for robust MDPs.
Numerical experiments confirm the effectiveness of the proposed approach.
Abstract
Robust Markov Decision Processes (MDPs) are receiving much attention in learning a robust policy which is less sensitive to environment changes. There are an increasing number of works analyzing sample-efficiency of robust MDPs. However, there are two major barriers to applying robust MDPs in practice. First, most works study robust MDPs in a model-based regime, where the transition probability needs to be estimated and requires a large amount of memories . Second, prior work typically assumes a strong oracle to obtain the optimal solution as an intermediate step to solve robust MDPs. However, in practice, such an oracle does not exist usually. To remove the oracle, we transform the original robust MDPs into an alternative form, which allows us to use stochastic gradient methods to solve the robust MDPs. Moreover, we prove the alternative form…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Machine Learning and Algorithms · Gaussian Processes and Bayesian Inference
