Disparity Between Batches as a Signal for Early Stopping

Mahsa Forouzesh; Patrick Thiran

arXiv:2107.06665·cs.LG·July 15, 2021

Disparity Between Batches as a Signal for Early Stopping

Mahsa Forouzesh, Patrick Thiran

PDF

1 Repo

TL;DR

This paper introduces gradient disparity, a metric based on the difference between mini-batch gradients, which effectively signals early stopping and overfitting in deep neural network training, especially with limited or noisy data.

Contribution

It proposes a novel gradient disparity metric derived from probabilistic bounds, demonstrating its effectiveness for early stopping and assessing generalization and label noise.

Findings

01

Gradient disparity correlates with generalization error.

02

It outperforms validation-based early stopping in noisy or limited data scenarios.

03

Effective as an early-stopping criterion when data is scarce or noisy.

Abstract

We propose a metric for evaluating the generalization ability of deep neural networks trained with mini-batch gradient descent. Our metric, called gradient disparity, is the $ℓ_{2}$ norm distance between the gradient vectors of two mini-batches drawn from the training set. It is derived from a probabilistic upper bound on the difference between the classification errors over a given mini-batch, when the network is trained on this mini-batch and when the network is trained on another mini-batch of points sampled from the same dataset. We empirically show that gradient disparity is a very promising early-stopping criterion (i) when data is limited, as it uses all the samples for training and (ii) when available data has noisy labels, as it signals overfitting better than the validation data. Furthermore, we show in a wide range of experimental settings that gradient disparity is strongly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mahf93/disparity_early_stopping
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.