When does mixup promote local linearity in learned representations?

Arslan Chaudhry; Aditya Krishna Menon; Andreas Veit; Sadeep; Jayasumana; Srikumar Ramalingam; Sanjiv Kumar

arXiv:2210.16413·cs.LG·November 1, 2022

When does mixup promote local linearity in learned representations?

Arslan Chaudhry, Aditya Krishna Menon, Andreas Veit, Sadeep, Jayasumana, Srikumar Ramalingam, Sanjiv Kumar

PDF

Open Access

TL;DR

This paper investigates how Mixup regularization influences the linearity of learned representations across different network layers in semi-supervised learning, revealing that supervised Mixup increases non-linearity in intermediate layers, while unsupervised Mixup promotes overall linearity and faster convergence.

Contribution

It provides a detailed analysis of Mixup's effect on layer-wise linearity in neural networks within semi-supervised learning, highlighting differences between supervised and unsupervised applications.

Findings

01

Supervised Mixup increases non-linearity in intermediate layers.

02

Unsupervised Mixup promotes overall linearity in network representations.

03

Unsupervised Mixup leads to faster training convergence.

Abstract

Mixup is a regularization technique that artificially produces new samples using convex combinations of original training points. This simple technique has shown strong empirical performance, and has been heavily used as part of semi-supervised learning techniques such as mixmatch~\citep{berthelot2019mixmatch} and interpolation consistent training (ICT)~\citep{verma2019interpolation}. In this paper, we look at Mixup through a \emph{representation learning} lens in a semi-supervised learning setup. In particular, we study the role of Mixup in promoting linearity in the learned network representations. Towards this, we study two questions: (1) how does the Mixup loss that enforces linearity in the \emph{last} network layer propagate the linearity to the \emph{earlier} layers?; and (2) how does the enforcement of stronger Mixup loss on more than two data points affect the convergence of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques

MethodsMixup