Online Robust Reinforcement Learning with General Function Approximation

Debamita Ghosh; George K. Atia; Yue Wang

arXiv:2512.18957·cs.LG·March 5, 2026

Online Robust Reinforcement Learning with General Function Approximation

Debamita Ghosh, George K. Atia, Yue Wang

PDF

Open Access

TL;DR

This paper introduces a fully online robust reinforcement learning algorithm that uses general function approximation to learn policies resilient to environment uncertainties without prior data, backed by theoretical regret guarantees.

Contribution

It presents the first online DR-RL method with general function approximation and regret guarantees, removing the need for prior knowledge or offline data.

Findings

01

Regret bounds are sublinear and scale independently of state/action space size.

02

The approach is practical and scalable for structured problem classes.

03

The method effectively learns robust policies through interaction alone.

Abstract

In many real-world settings, reinforcement learning systems suffer performance degradation when the environment encountered at deployment differs from that observed during training. Distributionally robust reinforcement learning (DR-RL) mitigates this issue by seeking policies that maximize performance under the most adverse transition dynamics within a prescribed uncertainty set. Most existing DR-RL approaches, however, rely on strong data availability assumptions, such as access to a generative model or large offline datasets, and are largely restricted to tabular settings. In this work, we propose a fully online DR-RL algorithm with general function approximation that learns robust policies solely through interaction, without requiring prior knowledge or pre-collected data. Our approach is based on a dual-driven fitted robust Bellman procedure that simultaneously estimates the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning