Affinity and Diversity: Quantifying Mechanisms of Data Augmentation
Raphael Gontijo-Lopes, Sylvia J. Smullin, Ekin D. Cubuk, Ethan Dyer

TL;DR
This paper introduces interpretable measures called Affinity and Diversity to quantify how data augmentation improves neural network generalization, revealing that optimal augmentation balances both factors.
Contribution
It proposes new metrics for understanding data augmentation effects and demonstrates that combining affinity and diversity predicts augmentation success.
Findings
Augmentation performance correlates with joint affinity and diversity.
Optimal augmentation balances affinity and diversity.
Proposed measures are easy to compute and interpret.
Abstract
Though data augmentation has become a standard component of deep neural network training, the underlying mechanism behind the effectiveness of these techniques remains poorly understood. In practice, augmentation policies are often chosen using heuristics of either distribution shift or augmentation diversity. Inspired by these, we seek to quantify how data augmentation improves model generalization. To this end, we introduce interpretable and easy-to-compute measures: Affinity and Diversity. We find that augmentation performance is predicted not by either of these alone but by jointly optimizing the two.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning
