TL;DR
This paper introduces a method for learning robust policies in Markov Decision Processes using generative models, addressing environment mismatches and perturbations through a game-theoretic approach with theoretical guarantees.
Contribution
It proposes a novel game-theoretic framework for robust policy learning in RMDPs with provable sample complexity and extends to complex scenarios like POMDPs.
Findings
Algorithm finds near-optimal robust policies with polynomial samples.
Method handles general environment perturbations.
Framework extends to robust POMDPs.
Abstract
In high-stake scenarios like medical treatment and auto-piloting, it's risky or even infeasible to collect online experimental data to train the agent. Simulation-based training can alleviate this issue, but may suffer from its inherent mismatches from the simulator and real environment. It is therefore imperative to utilize the simulator to learn a robust policy for the real-world deployment. In this work, we consider policy learning for Robust Markov Decision Processes (RMDP), where the agent tries to seek a robust policy with respect to unexpected perturbations on the environments. Specifically, we focus on the setting where the training environment can be characterized as a generative model and a constrained perturbation can be added to the model during testing. Our goal is to identify a near-optimal robust policy for the perturbed testing environment, which introduces additional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
