Regularization in network optimization via trimmed stochastic gradient descent with noisy label
Kensuke Nakamura, Bong-Soo Sohn, Kyoung-Jae Won, Byung-Woo Hong

TL;DR
This paper introduces Label-Noised Trim-SGD, a novel optimization method that leverages label noise and example trimming to improve regularization and generalization in neural network training.
Contribution
It proposes a simple first-order optimization algorithm that effectively uses label noise with trimming to outperform existing methods in network training.
Findings
Outperforms state-of-the-art optimization methods on major benchmarks.
Effectively uses large label noise for better regularization.
Demonstrates improved generalization in neural networks.
Abstract
Regularization is essential for avoiding over-fitting to training data in network optimization, leading to better generalization of the trained networks. The label noise provides a strong implicit regularization by replacing the target ground truth labels of training examples by uniform random labels. However, it can cause undesirable misleading gradients due to the large loss associated with incorrect labels. We propose a first-order optimization method (Label-Noised Trim-SGD) that uses the label noise with the example trimming in order to remove the outliers based on the loss. The proposed algorithm is simple yet enables us to impose a large label-noise and obtain a better regularization effect than the original methods. The quantitative analysis is performed by comparing the behavior of the label noise, the example trimming, and the proposed algorithm. We also present empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Video Surveillance and Tracking Methods · Advanced Multi-Objective Optimization Algorithms
