Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness
Tejas Gokhale, Swaroop Mishra, Man Luo, Bhavdeep Singh Sachdeva and, Chitta Baral

TL;DR
This paper empirically investigates how various data modification strategies affect out-of-domain generalization and adversarial robustness across NLP and computer vision tasks, revealing that more data generally improves both, while data filtering can be detrimental.
Contribution
It provides a comprehensive empirical analysis of data modification methods, highlighting their impacts on OOD performance and adversarial robustness, and visualizes these effects on synthetic data.
Findings
More data improves OOD accuracy and adversarial robustness.
Data filtering can harm OOD accuracy on certain tasks.
Synthetic visualization clarifies effects of data modifications.
Abstract
Data modification, either via additional training datasets, data augmentation, debiasing, and dataset filtering, has been proposed as an effective solution for generalizing to out-of-domain (OOD) inputs, in both natural language processing and computer vision literature. However, the effect of data modification on adversarial robustness remains unclear. In this work, we conduct a comprehensive study of common data modification strategies and evaluate not only their in-domain and OOD performance, but also their adversarial robustness (AR). We also present results on a two-dimensional synthetic dataset to visualize the effect of each method on the training distribution. This work serves as an empirical study towards understanding the relationship between generalizing to unseen domains and defending against adversarial perturbations. Our findings suggest that more data (either via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
