Flatness and Gradient Alignment Are Both Necessary: Spectral-Aware Gradient-Aligned Exploration for Multi-Distribution Learning

Aristotelis Ballas; Christos Diou

arXiv:2605.07914·cs.LG·May 11, 2026

Flatness and Gradient Alignment Are Both Necessary: Spectral-Aware Gradient-Aligned Exploration for Multi-Distribution Learning

Aristotelis Ballas, Christos Diou

PDF

TL;DR

This paper introduces SAGE, a method that considers both flatness and gradient alignment for multi-distribution learning, leading to improved generalization and state-of-the-art results.

Contribution

The paper derives a new excess-risk decomposition highlighting the importance of both flatness and gradient alignment, and proposes SAGE to optimize both properties.

Findings

01

SAGE achieves state-of-the-art performance on DomainBed.

02

SAGE improves multi-task learning benchmarks.

03

Both flatness and gradient alignment are necessary for better generalization.

Abstract

Sharpness-aware and gradient-alignment methods have been shown to improve generalization, however each family of methods targets a single geometric property of the loss landscape, while ignoring the other. In this paper, we show that this omission is structurally unavoidable and that both flatness and gradient alignment should be considered in multi-distribution learning settings. Specifically, we derive an excess-risk decomposition that yields two additive leading-order terms: (i) an alignment term, controlled by the trace of $\overset{ˉ}{H}^{- 1} Σ_{g}$ and (ii) a curvature term, controlled by $\overset{ˉ}{H}$ , where $\overset{ˉ}{H}$ is the average Hessian and $Σ_{g}$ is the covariance of the gradient across distributions. Notably, $\overset{ˉ}{H}$ appears inverted in one and non-inverted in the other. We further show, via a counterexample, that neither quantity bounds the other in general, so no algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.