Reward-Preserving Attacks For Robust Reinforcement Learning

Lucas Schott; Elies Gherbi; Hatem Hajri; Sylvain Lamprier

arXiv:2601.07118·cs.LG·January 30, 2026

Reward-Preserving Attacks For Robust Reinforcement Learning

Lucas Schott, Elies Gherbi, Hatem Hajri, Sylvain Lamprier

PDF

Open Access

TL;DR

This paper introduces reward-preserving adversarial attacks in reinforcement learning that adapt perturbation strength dynamically to maintain a specified return gap, leading to more robust policies.

Contribution

It proposes a novel adaptive attack method using a learned critic to preserve reward levels, improving robustness over fixed or random perturbation strategies.

Findings

01

Adaptive attacks outperform fixed-radius methods.

02

Policies trained with adaptive attacks are robust across various perturbation magnitudes.

03

The method maintains nominal performance while enhancing robustness.

Abstract

Adversarial training in reinforcement learning (RL) is challenging because perturbations cascade through trajectories and compound over time, making fixed-strength attacks either overly destructive or too conservative. We propose reward-preserving attacks, which adapt adversarial strength so that an $α$ fraction of the nominal-to-worst-case return gap remains achievable at each state. In deep RL, perturbation magnitudes $η$ are selected dynamically, using a learned critic $Q ((s, a), η)$ that estimates the expected return of $α$ -reward-preserving rollouts. For intermediate values of $α$ , this adaptive training yields policies that are robust across a wide range of perturbation magnitudes while preserving nominal performance, outperforming fixed-radius and uniformly sampled-radius adversarial training.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Smart Grid Security and Resilience