When does mixup promote local linearity in learned representations?
Arslan Chaudhry, Aditya Krishna Menon, Andreas Veit, Sadeep, Jayasumana, Srikumar Ramalingam, Sanjiv Kumar

TL;DR
This paper investigates how Mixup regularization influences the linearity of learned representations across different network layers in semi-supervised learning, revealing that supervised Mixup increases non-linearity in intermediate layers, while unsupervised Mixup promotes overall linearity and faster convergence.
Contribution
It provides a detailed analysis of Mixup's effect on layer-wise linearity in neural networks within semi-supervised learning, highlighting differences between supervised and unsupervised applications.
Findings
Supervised Mixup increases non-linearity in intermediate layers.
Unsupervised Mixup promotes overall linearity in network representations.
Unsupervised Mixup leads to faster training convergence.
Abstract
Mixup is a regularization technique that artificially produces new samples using convex combinations of original training points. This simple technique has shown strong empirical performance, and has been heavily used as part of semi-supervised learning techniques such as mixmatch~\citep{berthelot2019mixmatch} and interpolation consistent training (ICT)~\citep{verma2019interpolation}. In this paper, we look at Mixup through a \emph{representation learning} lens in a semi-supervised learning setup. In particular, we study the role of Mixup in promoting linearity in the learned network representations. Towards this, we study two questions: (1) how does the Mixup loss that enforces linearity in the \emph{last} network layer propagate the linearity to the \emph{earlier} layers?; and (2) how does the enforcement of stronger Mixup loss on more than two data points affect the convergence of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques
MethodsMixup
