DRO-Augment Framework: Robustness by Synergizing Wasserstein Distributionally Robust Optimization and Data Augmentation
Jiaming Hu, Debarghya Mukherjee, Ioannis Ch. Paschalidis

TL;DR
This paper introduces DRO-Augment, a framework combining Wasserstein Distributionally Robust Optimization with data augmentation to significantly enhance neural network robustness against various corruptions and adversarial attacks, while maintaining accuracy.
Contribution
The paper presents a novel integration of W-DRO with data augmentation, providing improved robustness and new theoretical generalization bounds for neural networks.
Findings
Outperforms existing augmentation methods under severe perturbations
Maintains accuracy on clean datasets across multiple benchmarks
Establishes new theoretical generalization error bounds
Abstract
In many real-world applications, ensuring the robustness and stability of deep neural networks (DNNs) is crucial, particularly for image classification tasks that encounter various input perturbations. While data augmentation techniques have been widely adopted to enhance the resilience of a trained model against such perturbations, there remains significant room for improvement in robustness against corrupted data and adversarial attacks simultaneously. To address this challenge, we introduce DRO-Augment, a novel framework that integrates Wasserstein Distributionally Robust Optimization (W-DRO) with various data augmentation strategies to improve the robustness of the models significantly across a broad spectrum of corruptions. Our method outperforms existing augmentation methods under severe data perturbations and adversarial attack scenarios while maintaining the accuracy on the…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. Combines two complementary robustness paradigms, Wasserstein DRO in training and data augmentation before optimization, within a single unified and implementable framework. And effectiveness is validated by consistent empirical gains across multiply common datasets. 2. The generalization bound includes explicit ρ-dependence and recovers the expected nonparametric rate under sparse ReQU networks, improving interpretability of robustness–sample trade-offs. 3. The proposed refinement of CIFAR-
1. The claimed L∞-Wasserstein DRO formulation conflicts with the L2-based implementation for the gradient penalty (P. 8, L.400-402). This inconsistency weakens the claim that the model optimizes L∞ W-DRO. 2. The theoretical contribution on adversarial risk bounds is largely incremental. it mainly differs in applying mixup data and sparse ReQU architectures rather than introducing a new bounding method. Also, the network class smoothness bounds for norm of gradient and (operator) norm of the He
1. The paper tackles both natural corruptions and adversarial attacks simultaneously, which is a practical consideration often overlooked in papers that focus on only one type of robustness. 2. The paper provides generalization error bounds for neural networks trained with W-DRO and augmented data (Theorem 4.1), achieving an improved convergence rate compared to previous work. 3. The authors identify and address a real issue with CIFAR-C severity calibration, proposing a more consistent eval
1. The main contribution is essentially combining two existing techniques (W-DRO and data augmentation) without fundamental algorithmic innovation. 2. The paper admits DRO-Augment adds overhead due to gradient-norm evaluation but dismisses it as small. However, no measurements (FLOPs, time comparison) are given. Given that W-DRO involves per-sample gradients, cost may scale poorly with model size. 3. Only PreActResNet-18 is tested. Without scaling to transformers, larger CNNs, or ImageNet-leve
- The combination of W-DRO and data augmentation is well-motivated and technically sound, effectively merging two complementary robustness strategies. - The paper establishes generalization error bounds for neural networks trained with W-DRO on augmented data, achieving a faster convergence rate compared to prior work. - Extensive experiments across multiple benchmark datasets (CIFAR-10-C, CIFAR-100-C, Tiny-ImageNet-C, Fashion-MNIST) with various attack types (PGD, AutoAttack, C&W, FAB-T, Square
- While the paper mentions small additional time costs, there is no systematic analysis of computational overhead compared to baselines, memory requirements, or scalability to larger datasets/models. Actually, this is very critical in practice. - The ablation studies, mainly in Table 3, only examine CIFAR-100-C and Fashion-MNIST. It should cover more datasets and analyze the sensitivity to key hyperparameters (for example, the mixing ratios \frac{\alpha}{\beta}) more thoroughly. - The experimen
The proposed unified framework aims to enhance robustness against both common corruptions and adversarial attacks.
* The overall contribution of this work appears to be marginal. The objective function defined in Eq. (2.1) is adopted from prior work, and the data augmentation strategies employed are common and well-established. * According to the results reported in RobustBench [1], accuracies against adversarial examples and corrupted data are evaluated on two distinct leaderboards. Methods such as NoisyMix and AugMix have already achieved strong performance on corruption benchmarks. Simply combining these
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Portfolio Optimization
