Policy Learning for Robust Markov Decision Process with a Mismatched   Generative Model

Jialian Li; Tongzheng Ren; Dong Yan; Hang Su; Jun Zhu

arXiv:2203.06587·cs.LG·March 16, 2022

Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model

Jialian Li, Tongzheng Ren, Dong Yan, Hang Su, Jun Zhu

PDF

1 Video

TL;DR

This paper introduces a method for learning robust policies in Markov Decision Processes using generative models, addressing environment mismatches and perturbations through a game-theoretic approach with theoretical guarantees.

Contribution

It proposes a novel game-theoretic framework for robust policy learning in RMDPs with provable sample complexity and extends to complex scenarios like POMDPs.

Findings

01

Algorithm finds near-optimal robust policies with polynomial samples.

02

Method handles general environment perturbations.

03

Framework extends to robust POMDPs.

Abstract

In high-stake scenarios like medical treatment and auto-piloting, it's risky or even infeasible to collect online experimental data to train the agent. Simulation-based training can alleviate this issue, but may suffer from its inherent mismatches from the simulator and real environment. It is therefore imperative to utilize the simulator to learn a robust policy for the real-world deployment. In this work, we consider policy learning for Robust Markov Decision Processes (RMDP), where the agent tries to seek a robust policy with respect to unexpected perturbations on the environments. Specifically, we focus on the setting where the training environment can be characterized as a generative model and a constrained perturbation can be added to the model during testing. Our goal is to identify a near-optimal robust policy for the perturbed testing environment, which introduces additional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model· underline