On Corruption-Robustness in Performative Reinforcement Learning

Vasilis Pollatos; Debmalya Mandal; Goran Radanovic

arXiv:2505.05609·cs.LG·May 12, 2025

On Corruption-Robustness in Performative Reinforcement Learning

Vasilis Pollatos, Debmalya Mandal, Goran Radanovic

PDF

Open Access 1 Video

TL;DR

None

Contribution

None

Abstract

In performative Reinforcement Learning (RL), an agent faces a policy-dependent environment: the reward and transition functions depend on the agent's policy. Prior work on performative RL has studied the convergence of repeated retraining approaches to a performatively stable policy. In the finite sample regime, these approaches repeatedly solve for a saddle point of a convex-concave objective, which estimates the Lagrangian of a regularized version of the reinforcement learning problem. In this paper, we aim to extend such repeated retraining approaches, enabling them to operate under corrupted data. More specifically, we consider Huber's $ϵ$ -contamination model, where an $ϵ$ fraction of data points is corrupted by arbitrary adversarial noise. We propose a repeated retraining approach based on convex-concave optimization under corrupted gradients and a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On Corruption-Robustness in Performative Reinforcement Learning· underline

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research