Gradient-Weight Alignment as a Train-Time Proxy for Generalization in Classification Tasks

Florian A. H\"olzl; Daniel Rueckert; Georgios Kaissis

arXiv:2510.25480·cs.LG·October 30, 2025

Gradient-Weight Alignment as a Train-Time Proxy for Generalization in Classification Tasks

Florian A. H\"olzl, Daniel Rueckert, Georgios Kaissis

PDF

TL;DR

This paper introduces Gradient-Weight Alignment (GWA), a new training-time metric that measures the coherence between sample gradients and model weights to predict generalization performance and identify influential samples.

Contribution

The paper proposes GWA as an efficient, training-time proxy for generalization, enabling validation-free model analysis and sample influence assessment in classification tasks.

Findings

01

GWA accurately predicts optimal early stopping points.

02

GWA enables comparison of different models during training.

03

GWA identifies influential training samples effectively.

Abstract

Robust validation metrics remain essential in contemporary deep learning, not only to detect overfitting and poor generalization, but also to monitor training dynamics. In the supervised classification setting, we investigate whether interactions between training data and model weights can yield such a metric that both tracks generalization during training and attributes performance to individual training samples. We introduce Gradient-Weight Alignment (GWA), quantifying the coherence between per-sample gradients and model weights. We show that effective learning corresponds to coherent alignment, while misalignment indicates deteriorating generalization. GWA is efficiently computable during training and reflects both sample-specific contributions and dataset-wide learning dynamics. Extensive experiments show that GWA accurately predicts optimal early stopping, enables principled model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.