Online Policy Optimization for Robust MDP

Jing Dong; Jingwei Li; Baoxiang Wang; Jingzhao Zhang

arXiv:2209.13841·cs.LG·September 29, 2022·1 cites

Online Policy Optimization for Robust MDP

Jing Dong, Jingwei Li, Baoxiang Wang, Jingzhao Zhang

PDF

Open Access

TL;DR

This paper introduces an efficient online robust policy optimization algorithm for Markov decision processes, addressing environmental uncertainties and providing the first regret bounds in this setting.

Contribution

It proposes a novel optimistic policy optimization method for online robust MDPs with theoretical guarantees, incorporating a new update rule via Fenchel conjugates.

Findings

01

First regret bound established for online robust MDPs.

02

Algorithm demonstrates provable efficiency in uncertain environments.

03

Addresses the challenge of exploration-exploitation trade-off under adversarial conditions.

Abstract

Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go. However, real-world deployment of end-to-end RL models is less common, as RL models can be very sensitive to slight perturbation of the environment. The robust Markov decision process (MDP) framework -- in which the transition probabilities belong to an uncertainty set around a nominal model -- provides one way to develop robust models. While previous analysis shows RL algorithms are effective assuming access to a generative model, it remains unclear whether RL can be efficient under a more realistic online setting, which requires a careful balance between exploration and exploitation. In this work, we consider online robust MDP by interacting with an unknown nominal system. We propose a robust optimistic policy optimization algorithm that is provably efficient. To address…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Malware Detection Techniques · Artificial Intelligence in Games