DivAug: Plug-in Automated Data Augmentation with Explicit Diversity Maximization
Zirui Liu, Haifeng Jin, Ting-Hsiang Wang, Kaixiong Zhou, Xia Hu

TL;DR
DivAug introduces an explicit diversity measure called Variance Diversity, theoretically links it to regularization benefits, and employs an unsupervised framework to enhance data augmentation, improving semi-supervised learning performance efficiently.
Contribution
Proposes Variance Diversity as a measurable and theoretically justified diversity metric, and develops DivAug, an unsupervised method to maximize it without search, boosting augmentation regularization effects.
Findings
Variance Diversity correlates with test accuracy gains.
DivAug achieves comparable performance to state-of-the-art methods.
Enhances semi-supervised learning with better efficiency.
Abstract
Human-designed data augmentation strategies have been replaced by automatically learned augmentation policy in the past two years. Specifically, recent work has empirically shown that the superior performance of the automated data augmentation methods stems from increasing the diversity of augmented data \cite{autoaug, randaug}. However, two factors regarding the diversity of augmented data are still missing: 1) the explicit definition (and thus measurement) of diversity and 2) the quantifiable relationship between diversity and its regularization effects. To bridge this gap, we propose a diversity measure called Variance Diversity and theoretically show that the regularization effect of data augmentation is promised by Variance Diversity. We validate in experiments that the relative gain from automated data augmentation in test accuracy is highly correlated to Variance Diversity. An…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Indoor and Outdoor Localization Technologies
MethodsRandAugment
