MaxUp: A Simple Way to Improve Generalization of Neural Network Training
Chengyue Gong, Tongzheng Ren, Mao Ye, Qiang Liu

TL;DR
MaxUp is a simple yet effective data augmentation technique that enhances neural network generalization by minimizing the worst-case loss over augmented data, leading to improved performance across various tasks.
Contribution
The paper introduces MaxUp, a novel method that improves neural network generalization by optimizing for the worst-case loss over augmented data, with minimal computational overhead.
Findings
MaxUp improves ImageNet top-1 accuracy from 85.5% to 85.8%.
MaxUp outperforms existing baseline methods across multiple tasks.
MaxUp introduces a regularization effect similar to gradient norm penalties.
Abstract
We propose \emph{MaxUp}, an embarrassingly simple, highly effective technique for improving the generalization performance of machine learning models, especially deep neural networks. The idea is to generate a set of augmented data with some random perturbations or transforms and minimize the maximum, or worst case loss over the augmented data. By doing so, we implicitly introduce a smoothness or robustness regularization against the random perturbations, and hence improve the generation performance. For example, in the case of Gaussian perturbation, \emph{MaxUp} is asymptotically equivalent to using the gradient norm of the loss as a penalty to encourage smoothness. We test \emph{MaxUp} on a range of tasks, including image classification, language modeling, and adversarial certification, on which \emph{MaxUp} consistently outperforms the existing best baseline methods, without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Topic Modeling
MethodsTest · RMSProp · Tanh Activation · Depthwise Convolution · Pointwise Convolution · Bottleneck Residual Block · Residual Block · Depthwise Separable Convolution · Kaiming Initialization · Sigmoid Activation
