Optimistic Policy Learning under Pessimistic Adversaries with Regret and Violation Guarantees

Sourav Ganguly; Kartik Pandit; Arnob Ghosh

arXiv:2604.14243·cs.LG·April 20, 2026

Optimistic Policy Learning under Pessimistic Adversaries with Regret and Violation Guarantees

Sourav Ganguly, Kartik Pandit, Arnob Ghosh

PDF

TL;DR

This paper introduces a novel reinforcement learning framework that models exogenous factors as adversaries, ensuring safety and optimality guarantees in environments with strategic external influences.

Contribution

It proposes RHC-UCRL, a model-based algorithm that explicitly accounts for adversarial dynamics and separates different types of uncertainty, achieving regret and constraint violation guarantees.

Findings

01

RHC-UCRL achieves sub-linear regret.

02

The algorithm guarantees bounded constraint violations.

03

Explicit adversarial modeling improves safety in RL.

Abstract

Real-world decision-making systems operate in environments where state transitions depend not only on the agent's actions, but also on \textbf{exogenous factors outside its control}--competing agents, environmental disturbances, or strategic adversaries--formally, $s_{h + 1} = f (s_{h}, a_{h}, \overset{a}{ˉ}_{h}) + ω_{h}$ where $\overset{a}{ˉ}_{h}$ is the adversary/external action, $a_{h}$ is the agent's action, and $ω_{h}$ is an additive noise. Ignoring such factors can yield policies that are optimal in isolation but \textbf{fail catastrophically in deployment}, particularly when safety constraints must be satisfied. Standard Constrained MDP formulations assume the agent is the sole driver of state evolution, an assumption that breaks down in safety-critical settings. Existing robust RL approaches address this via distributional robustness over transition kernels, but do not explicitly model the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.