Model-Free Robust $\phi$-Divergence Reinforcement Learning Using Both   Offline and Online Data

Kishan Panaganti; Adam Wierman; Eric Mazumdar

arXiv:2405.05468·cs.LG·May 10, 2024

Model-Free Robust $\phi$-Divergence Reinforcement Learning Using Both Offline and Online Data

Kishan Panaganti, Adam Wierman, Eric Mazumdar

PDF

Open Access

TL;DR

This paper introduces model-free algorithms for robust reinforcement learning that leverage both offline and online data, providing theoretical guarantees for high-dimensional systems with function approximation.

Contribution

It proposes the first unified analysis for $eta$-divergence-based robust policies and introduces a hybrid framework combining offline and online data with new theoretical guarantees.

Findings

01

First unified analysis for $eta$-divergences in high-dimensional systems.

02

Introduction of hybrid offline-online robust RL framework.

03

Theoretical guarantees on policy performance in large state spaces.

Abstract

The robust $ϕ$ -regularized Markov Decision Process (RRMDP) framework focuses on designing control policies that are robust against parameter uncertainties due to mismatches between the simulator (nominal) model and real-world settings. This work makes two important contributions. First, we propose a model-free algorithm called Robust $ϕ$ -regularized fitted Q-iteration (RPQ) for learning an $ϵ$ -optimal robust policy that uses only the historical data collected by rolling out a behavior policy (with robust exploratory requirement) on the nominal model. To the best of our knowledge, we provide the first unified analysis for a class of $ϕ$ -divergences achieving robust optimal policies in high-dimensional systems with general function approximation. Second, we introduce the hybrid robust $ϕ$ -regularized reinforcement learning framework to learn an optimal robust policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Traffic control and management