Distributionally Robust Optimization with Adversarial Data Contamination

Shuyao Li; Ilias Diakonikolas; Jelena Diakonikolas

arXiv:2507.10718·cs.LG·November 4, 2025

Distributionally Robust Optimization with Adversarial Data Contamination

Shuyao Li, Ilias Diakonikolas, Jelena Diakonikolas

PDF

Open Access

TL;DR

This paper proposes a new approach to distributionally robust optimization that effectively handles adversarial data contamination and distributional shifts, providing theoretical guarantees and efficient algorithms.

Contribution

It introduces a novel modeling framework and algorithm for Wasserstein-1 DRO with contaminated data, achieving provable error bounds under adversarial corruption.

Findings

01

Achieves an estimation error of O(√ε) under contamination

02

Provides the first rigorous guarantees for combined robustness against data contamination and distributional shifts

03

Develops an efficient algorithm inspired by robust statistics

Abstract

Distributionally Robust Optimization (DRO) provides a framework for decision-making under distributional uncertainty, yet its effectiveness can be compromised by outliers in the training data. This paper introduces a principled approach to simultaneously address both challenges. We focus on optimizing Wasserstein-1 DRO objectives for generalized linear models with convex Lipschitz loss functions, where an $ϵ$ -fraction of the training data is adversarially corrupted. Our primary contribution lies in a novel modeling framework that integrates robustness against training data contamination with robustness against distributional shifts, alongside an efficient algorithm inspired by robust statistics to solve the resulting optimization problem. We prove that our method achieves an estimation error of $O (ϵ)$ for the true DRO objective value using only the contaminated data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Process Monitoring