DRO-Augment Framework: Robustness by Synergizing Wasserstein Distributionally Robust Optimization and Data Augmentation

Jiaming Hu; Debarghya Mukherjee; Ioannis Ch. Paschalidis

arXiv:2506.17874·stat.ML·June 26, 2025

DRO-Augment Framework: Robustness by Synergizing Wasserstein Distributionally Robust Optimization and Data Augmentation

Jiaming Hu, Debarghya Mukherjee, Ioannis Ch. Paschalidis

PDF

Open Access 4 Reviews

TL;DR

This paper introduces DRO-Augment, a framework combining Wasserstein Distributionally Robust Optimization with data augmentation to significantly enhance neural network robustness against various corruptions and adversarial attacks, while maintaining accuracy.

Contribution

The paper presents a novel integration of W-DRO with data augmentation, providing improved robustness and new theoretical generalization bounds for neural networks.

Findings

01

Outperforms existing augmentation methods under severe perturbations

02

Maintains accuracy on clean datasets across multiple benchmarks

03

Establishes new theoretical generalization error bounds

Abstract

In many real-world applications, ensuring the robustness and stability of deep neural networks (DNNs) is crucial, particularly for image classification tasks that encounter various input perturbations. While data augmentation techniques have been widely adopted to enhance the resilience of a trained model against such perturbations, there remains significant room for improvement in robustness against corrupted data and adversarial attacks simultaneously. To address this challenge, we introduce DRO-Augment, a novel framework that integrates Wasserstein Distributionally Robust Optimization (W-DRO) with various data augmentation strategies to improve the robustness of the models significantly across a broad spectrum of corruptions. Our method outperforms existing augmentation methods under severe data perturbations and adversarial attack scenarios while maintaining the accuracy on the…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 4Confidence 3

Strengths

1. Combines two complementary robustness paradigms, Wasserstein DRO in training and data augmentation before optimization, within a single unified and implementable framework. And effectiveness is validated by consistent empirical gains across multiply common datasets. 2. The generalization bound includes explicit ρ-dependence and recovers the expected nonparametric rate under sparse ReQU networks, improving interpretability of robustness–sample trade-offs. 3. The proposed refinement of CIFAR-

Weaknesses

1. The claimed L∞-Wasserstein DRO formulation conflicts with the L2-based implementation for the gradient penalty (P. 8, L.400-402). This inconsistency weakens the claim that the model optimizes L∞ W-DRO. 2. The theoretical contribution on adversarial risk bounds is largely incremental. it mainly differs in applying mixup data and sparse ReQU architectures rather than introducing a new bounding method. Also, the network class smoothness bounds for norm of gradient and (operator) norm of the He

Reviewer 02Rating 4Confidence 3

Strengths

1. The paper tackles both natural corruptions and adversarial attacks simultaneously, which is a practical consideration often overlooked in papers that focus on only one type of robustness. 2. The paper provides generalization error bounds for neural networks trained with W-DRO and augmented data (Theorem 4.1), achieving an improved convergence rate compared to previous work. 3. The authors identify and address a real issue with CIFAR-C severity calibration, proposing a more consistent eval

Weaknesses

1. The main contribution is essentially combining two existing techniques (W-DRO and data augmentation) without fundamental algorithmic innovation. 2. The paper admits DRO-Augment adds overhead due to gradient-norm evaluation but dismisses it as small. However, no measurements (FLOPs, time comparison) are given. Given that W-DRO involves per-sample gradients, cost may scale poorly with model size. 3. Only PreActResNet-18 is tested. Without scaling to transformers, larger CNNs, or ImageNet-leve

Reviewer 03Rating 4Confidence 2

Strengths

- The combination of W-DRO and data augmentation is well-motivated and technically sound, effectively merging two complementary robustness strategies. - The paper establishes generalization error bounds for neural networks trained with W-DRO on augmented data, achieving a faster convergence rate compared to prior work. - Extensive experiments across multiple benchmark datasets (CIFAR-10-C, CIFAR-100-C, Tiny-ImageNet-C, Fashion-MNIST) with various attack types (PGD, AutoAttack, C&W, FAB-T, Square

Weaknesses

- While the paper mentions small additional time costs, there is no systematic analysis of computational overhead compared to baselines, memory requirements, or scalability to larger datasets/models. Actually, this is very critical in practice. - The ablation studies, mainly in Table 3, only examine CIFAR-100-C and Fashion-MNIST. It should cover more datasets and analyze the sensitivity to key hyperparameters (for example, the mixing ratios \frac{\alpha}{\beta}) more thoroughly. - The experimen

Reviewer 04Rating 2Confidence 4

Strengths

The proposed unified framework aims to enhance robustness against both common corruptions and adversarial attacks.

Weaknesses

* The overall contribution of this work appears to be marginal. The objective function defined in Eq. (2.1) is adopted from prior work, and the data augmentation strategies employed are common and well-established. * According to the results reported in RobustBench [1], accuracies against adversarial examples and corrupted data are evaluated on two distinct leaderboards. Methods such as NoisyMix and AugMix have already achieved strong performance on corruption benchmarks. Simply combining these

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk and Portfolio Optimization