Flatness and Gradient Alignment Are Both Necessary: Spectral-Aware Gradient-Aligned Exploration for Multi-Distribution Learning
Aristotelis Ballas, Christos Diou

TL;DR
This paper introduces SAGE, a method that considers both flatness and gradient alignment for multi-distribution learning, leading to improved generalization and state-of-the-art results.
Contribution
The paper derives a new excess-risk decomposition highlighting the importance of both flatness and gradient alignment, and proposes SAGE to optimize both properties.
Findings
SAGE achieves state-of-the-art performance on DomainBed.
SAGE improves multi-task learning benchmarks.
Both flatness and gradient alignment are necessary for better generalization.
Abstract
Sharpness-aware and gradient-alignment methods have been shown to improve generalization, however each family of methods targets a single geometric property of the loss landscape, while ignoring the other. In this paper, we show that this omission is structurally unavoidable and that both flatness and gradient alignment should be considered in multi-distribution learning settings. Specifically, we derive an excess-risk decomposition that yields two additive leading-order terms: (i) an alignment term, controlled by the trace of and (ii) a curvature term, controlled by , where is the average Hessian and is the covariance of the gradient across distributions. Notably, appears inverted in one and non-inverted in the other. We further show, via a counterexample, that neither quantity bounds the other in general, so no algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
