Stress-Testing ML Pipelines with Adversarial Data Corruption
Jiongli Zhu, Geyang Xu, Felipe Lorenzi, Boris Glavic, Babak Salimi

TL;DR
This paper presents SAVAGE, a framework for systematically identifying worst-case structured data corruptions that can severely degrade machine learning pipeline performance, aiding robustness evaluation and resilient design.
Contribution
SAVAGE introduces a causally inspired, bi-level optimization approach to model and discover impactful data corruptions in ML pipelines, including non-differentiable components.
Findings
Small corruptions (around 5%) can significantly degrade performance.
Structured corruptions outperform random or manual errors in impact.
The framework invalidates assumptions of existing robustness techniques.
Abstract
Structured data-quality issues, such as missing values correlated with demographics, culturally biased labels, or systemic selection biases, routinely degrade the reliability of machine-learning pipelines. Regulators now increasingly demand evidence that high-stakes systems can withstand these realistic, interdependent errors, yet current robustness evaluations typically use random or overly simplistic corruptions, leaving worst-case scenarios unexplored. We introduce SAVAGE, a causally inspired framework that (i) formally models realistic data-quality issues through dependency graphs and flexible corruption templates, and (ii) systematically discovers corruption patterns that maximally degrade a target performance metric. SAVAGE employs a bi-level optimization approach to efficiently identify vulnerable data subpopulations and fine-tune corruption severity, treating the full ML…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Radiation Effects in Electronics · Network Packet Processing and Optimization
