ORVIT: Near-Optimal Online Distributionally Robust Reinforcement Learning

Debamita Ghosh; George K. Atia; Yue Wang

arXiv:2508.03768·cs.LG·November 12, 2025

ORVIT: Near-Optimal Online Distributionally Robust Reinforcement Learning

Debamita Ghosh, George K. Atia, Yue Wang

PDF

Open Access 1 Video

TL;DR

This paper introduces ORVIT, an online distributionally robust reinforcement learning algorithm that achieves near-optimal performance guarantees in unknown environments without relying on offline data or generative models.

Contribution

It proposes a practical, computationally efficient online RL method for distributional robustness using $f$-divergence sets, with theoretical regret bounds and empirical validation.

Findings

01

Achieves sublinear regret in unknown environments.

02

Improves worst-case performance over baseline methods.

03

Matches theoretical lower bounds on regret.

Abstract

We investigate reinforcement learning (RL) in the presence of distributional mismatch between training and deployment, where policies trained in simulators often underperform in practice due to mismatches between training and deployment conditions, and thereby reliable guarantees on real-world performance are essential. Distributionally robust RL addresses this issue by optimizing worst-case performance over an uncertainty set of environments and providing an optimized lower bound on deployment performance. However, existing studies typically assume access to either a generative model or offline datasets with broad coverage of the deployment environment-assumptions that limit their practicality in unknown environments without prior knowledge. In this work, we study a more practical and challenging setting: online distributionally robust RL, where the agent interacts only with a single…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ORVIT: Near-Optimal Online Distributionally Robust Reinforcement Learning· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques