Online Robust Reinforcement Learning with General Function Approximation
Debamita Ghosh, George K. Atia, Yue Wang

TL;DR
This paper introduces a fully online robust reinforcement learning algorithm that uses general function approximation to learn policies resilient to environment uncertainties without prior data, backed by theoretical regret guarantees.
Contribution
It presents the first online DR-RL method with general function approximation and regret guarantees, removing the need for prior knowledge or offline data.
Findings
Regret bounds are sublinear and scale independently of state/action space size.
The approach is practical and scalable for structured problem classes.
The method effectively learns robust policies through interaction alone.
Abstract
In many real-world settings, reinforcement learning systems suffer performance degradation when the environment encountered at deployment differs from that observed during training. Distributionally robust reinforcement learning (DR-RL) mitigates this issue by seeking policies that maximize performance under the most adverse transition dynamics within a prescribed uncertainty set. Most existing DR-RL approaches, however, rely on strong data availability assumptions, such as access to a generative model or large offline datasets, and are largely restricted to tabular settings. In this work, we propose a fully online DR-RL algorithm with general function approximation that learns robust policies solely through interaction, without requiring prior knowledge or pre-collected data. Our approach is based on a dual-driven fitted robust Bellman procedure that simultaneously estimates the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning
