Online Robust Reinforcement Learning with Model Uncertainty

Yue Wang; Shaofeng Zou

arXiv:2109.14523·cs.LG·October 29, 2021·6 cites

Online Robust Reinforcement Learning with Model Uncertainty

Yue Wang, Shaofeng Zou

PDF

Open Access 1 Video

TL;DR

This paper introduces online model-free robust reinforcement learning algorithms that estimate uncertainty sets from data, ensuring convergence and robustness without additional discount factor conditions.

Contribution

It develops novel robust Q-learning and TDC algorithms with proven convergence and finite-time error bounds, extending robustness to various RL algorithms.

Findings

01

Algorithms converge to optimal robust Q-function and stationary points.

02

Finite-time error bounds comparable to vanilla algorithms.

03

Numerical experiments confirm robustness of the proposed methods.

Abstract

Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust RL, where the uncertainty set is defined to be centering at a misspecified MDP that generates a single sample trajectory sequentially and is assumed to be unknown. We develop a sample-based approach to estimate the unknown uncertainty set and design a robust Q-learning algorithm (tabular case) and robust TDC algorithm (function approximation setting), which can be implemented in an online and incremental fashion. For the robust Q-learning algorithm, we prove that it converges to the optimal robust Q function, and for the robust TDC algorithm, we prove that it converges asymptotically to some stationary points. Unlike the results in [Roy et al., 2017], our algorithms do not need any additional conditions on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Online Robust Reinforcement Learning with Model Uncertainty· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Auction Theory and Applications

MethodsQ-Learning