Online Robust Reinforcement Learning with Model Uncertainty
Yue Wang, Shaofeng Zou

TL;DR
This paper introduces online model-free robust reinforcement learning algorithms that estimate uncertainty sets from data, ensuring convergence and robustness without additional discount factor conditions.
Contribution
It develops novel robust Q-learning and TDC algorithms with proven convergence and finite-time error bounds, extending robustness to various RL algorithms.
Findings
Algorithms converge to optimal robust Q-function and stationary points.
Finite-time error bounds comparable to vanilla algorithms.
Numerical experiments confirm robustness of the proposed methods.
Abstract
Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust RL, where the uncertainty set is defined to be centering at a misspecified MDP that generates a single sample trajectory sequentially and is assumed to be unknown. We develop a sample-based approach to estimate the unknown uncertainty set and design a robust Q-learning algorithm (tabular case) and robust TDC algorithm (function approximation setting), which can be implemented in an online and incremental fashion. For the robust Q-learning algorithm, we prove that it converges to the optimal robust Q function, and for the robust TDC algorithm, we prove that it converges asymptotically to some stationary points. Unlike the results in [Roy et al., 2017], our algorithms do not need any additional conditions on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Auction Theory and Applications
MethodsQ-Learning
