How Data Augmentation affects Optimization for Linear Regression
Boris Hanin, Yi Sun

TL;DR
This paper analyzes how data augmentation schedules influence optimization in linear regression, revealing complex interactions with hyperparameters and providing convergence guarantees for augmented gradient descent.
Contribution
It offers a theoretical analysis of augmented gradient descent in linear regression, characterizing convergence and minimum points for arbitrary augmentation schemes.
Findings
Joint schedules for learning rate and augmentation ensure convergence.
Augmentation interacts complexly with learning rate even in convex settings.
Provides convergence rates and conditions for augmented gradient descent.
Abstract
Though data augmentation has rapidly emerged as a key tool for optimization in modern machine learning, a clear picture of how augmentation schedules affect optimization and interact with optimization hyperparameters such as learning rate is nascent. In the spirit of classical convex optimization and recent work on implicit bias, the present work analyzes the effect of augmentation on optimization in the simple convex setting of linear regression with MSE loss. We find joint schedules for learning rate and data augmentation scheme under which augmented gradient descent provably converges and characterize the resulting minimum. Our results apply to arbitrary augmentation schemes, revealing complex interactions between learning rates and augmentations even in the convex setting. Our approach interprets augmented (S)GD as a stochastic optimization method for a time-varying sequence of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning
MethodsStochastic Gradient Descent
