Intersection of Parallels as an Early Stopping Criterion
Ali Vardasbi, Maarten de Rijke, Mostafa Dehghani

TL;DR
This paper introduces a validation-free early stopping method called Cosine-Distance Criterion (CDC), which detects overfitting by monitoring the intersection of parallel model weights during training, improving generalization especially with noisy labels.
Contribution
The paper proposes a novel early stopping criterion based on weight intersection of parallel models, extending it from linear models to neural networks using counterfactual weights.
Findings
CDC outperforms existing methods in noisy label learning.
CDC improves generalization across multiple datasets.
The method is effective for both linear models and neural networks.
Abstract
A common way to avoid overfitting in supervised learning is early stopping, where a held-out set is used for iterative evaluation during training to find a sweet spot in the number of training steps that gives maximum generalization. However, such a method requires a disjoint validation set, thus part of the labeled data from the training set is usually left out for this purpose, which is not ideal when training data is scarce. Furthermore, when the training labels are noisy, the performance of the model over a validation set may not be an accurate proxy for generalization. In this paper, we propose a method to spot an early stopping point in the training iterations without the need for a validation set. We first show that in the overparameterized regime the randomly initialized weights of a linear model converge to the same direction during training. Using this result, we propose to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Face and Expression Recognition · Domain Adaptation and Few-Shot Learning
MethodsEarly Stopping
