Enhancing Robustness of Offline Reinforcement Learning Under Data Corruption via Sharpness-Aware Minimization
Le Xu, Jiayu Chen

TL;DR
This paper introduces the use of Sharpness-Aware Minimization (SAM) to improve the robustness of offline reinforcement learning algorithms against data corruption by seeking flatter minima in the loss landscape.
Contribution
It is the first to apply SAM as a plug-and-play optimizer in offline RL, demonstrating improved robustness on corrupted data benchmarks.
Findings
SAM-enhanced offline RL algorithms outperform baselines under data corruption.
Visualizations show SAM finds smoother, more robust solutions.
SAM improves generalization in offline RL with corrupted data.
Abstract
Offline reinforcement learning (RL) is vulnerable to real-world data corruption, with even robust algorithms failing under challenging observation and mixture corruptions. We posit this failure stems from data corruption creating sharp minima in the loss landscape, leading to poor generalization. To address this, we are the first to apply Sharpness-Aware Minimization (SAM) as a general-purpose, plug-and-play optimizer for offline RL. SAM seeks flatter minima, guiding models to more robust parameter regions. We integrate SAM into strong baselines for data corruption: IQL, a top-performing offline RL algorithm in this setting, and RIQL, an algorithm designed specifically for data-corruption robustness. We evaluate them on D4RL benchmarks with both random and adversarial corruption. Our SAM-enhanced methods consistently and significantly outperform the original baselines. Visualizations of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
