Sparse Offline Reinforcement Learning with Corruption Robustness
Nam Phuong Tran, Andi Nika, Goran Radanovic, Long Tran-Thanh, Debmalya Mandal

TL;DR
This paper develops robust offline sparse reinforcement learning methods that handle high-dimensional data corruption, providing the first guarantees in such challenging settings.
Contribution
It introduces actor-critic algorithms with sparse robust estimators, overcoming limitations of standard methods and ensuring robustness under contamination in high-dimensional sparse MDPs.
Findings
Proposes actor-critic methods with sparse robust estimators.
Provides the first non-vacuous guarantees for sparse offline RL under contamination.
Extends results to settings with strong data corruption.
Abstract
We investigate robustness to strong data corruption in offline sparse reinforcement learning (RL). In our setting, an adversary may arbitrarily perturb a fraction of the collected trajectories from a high-dimensional but sparse Markov decision process, and our goal is to estimate a near optimal policy. The main challenge is that, in the high-dimensional regime where the number of samples is smaller than the feature dimension , exploiting sparsity is essential for obtaining non-vacuous guarantees but has not been systematically studied in offline RL. We analyse the problem under uniform coverage and sparse single-concentrability assumptions. While Least Square Value Iteration (LSVI), a standard approach for robust offline RL, performs well under uniform coverage, we show that integrating sparsity into LSVI is unnatural, and its analysis may break down due to overly pessimistic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
