How do simple rotations affect the implicit bias of Adam?
Adela DePavia, Vasileios Charisopoulos, and Rebecca Willett

TL;DR
This paper investigates how simple rotations of data affect Adam's implicit bias in binary classification, revealing that rotations can diminish its advantage and proposing a reparameterization to restore its bias.
Contribution
The paper demonstrates that Adam's bias is sensitive to data rotations and introduces a reparameterization method to make Adam rotation-equivariant, restoring its ability to learn rich decision boundaries.
Findings
Small rotations can reverse Adam's bias towards nonlinear decision boundaries.
Reparameterization restores Adam's bias towards rich, nonlinear decision boundaries.
Adam's sensitivity to data rotations can be mitigated with orthogonal transformation techniques.
Abstract
Adaptive gradient methods such as Adam and Adagrad are widely used in machine learning, yet their effect on the generalization of learned models -- relative to methods like gradient descent -- remains poorly understood. Prior work on binary classification suggests that Adam exhibits a ``richness bias,'' which can help it learn nonlinear decision boundaries closer to the Bayes-optimal decision boundary relative to gradient descent. However, the coordinate-wise preconditioning scheme employed by Adam renders the overall method sensitive to orthogonal transformations of feature space. We show that this sensitivity can manifest as a reversal of Adam's competitive advantage: even small rotations of the underlying data distribution can make Adam forfeit its richness bias and converge to a linear decision boundary that is farther from the Bayes-optimal decision boundary than the one learned by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
