Enhancing Robustness of Offline Reinforcement Learning Under Data Corruption via Sharpness-Aware Minimization

Le Xu; Jiayu Chen

arXiv:2511.17568·cs.LG·April 8, 2026

Enhancing Robustness of Offline Reinforcement Learning Under Data Corruption via Sharpness-Aware Minimization

Le Xu, Jiayu Chen

PDF

TL;DR

This paper introduces the use of Sharpness-Aware Minimization (SAM) to improve the robustness of offline reinforcement learning algorithms against data corruption by seeking flatter minima in the loss landscape.

Contribution

It is the first to apply SAM as a plug-and-play optimizer in offline RL, demonstrating improved robustness on corrupted data benchmarks.

Findings

01

SAM-enhanced offline RL algorithms outperform baselines under data corruption.

02

Visualizations show SAM finds smoother, more robust solutions.

03

SAM improves generalization in offline RL with corrupted data.

Abstract

Offline reinforcement learning (RL) is vulnerable to real-world data corruption, with even robust algorithms failing under challenging observation and mixture corruptions. We posit this failure stems from data corruption creating sharp minima in the loss landscape, leading to poor generalization. To address this, we are the first to apply Sharpness-Aware Minimization (SAM) as a general-purpose, plug-and-play optimizer for offline RL. SAM seeks flatter minima, guiding models to more robust parameter regions. We integrate SAM into strong baselines for data corruption: IQL, a top-performing offline RL algorithm in this setting, and RIQL, an algorithm designed specifically for data-corruption robustness. We evaluate them on D4RL benchmarks with both random and adversarial corruption. Our SAM-enhanced methods consistently and significantly outperform the original baselines. Visualizations of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.