L2B: Learning to Bootstrap Robust Models for Combating Label Noise
Yuyin Zhou, Xianhang Li, Fengze Liu, Qingyue Wei, Xuxi Chen, Lequan, Yu, Cihang Xie, Matthew P. Lungren, Lei Xing

TL;DR
L2B is a novel meta-learning approach that improves the robustness of deep neural networks against label noise by dynamically adjusting sample and label importance weights, leading to better generalization and performance.
Contribution
The paper introduces L2B, a versatile meta-learning method that implicitly relabels and reweights samples to combat label noise without extra costs, outperforming existing techniques.
Findings
L2B significantly reduces overfitting to noisy labels.
It improves model robustness across natural and medical imaging tasks.
The method is compatible with existing label noise learning techniques.
Abstract
Deep neural networks have shown great success in representation learning. However, when learning with noisy labels (LNL), they can easily overfit and fail to generalize to new data. This paper introduces a simple and effective method, named Learning to Bootstrap (L2B), which enables models to bootstrap themselves using their own predictions without being adversely affected by erroneous pseudo-labels. It achieves this by dynamically adjusting the importance weight between real observed and generated labels, as well as between different samples through meta-learning. Unlike existing instance reweighting methods, the key to our method lies in a new, versatile objective that enables implicit relabeling concurrently, leading to significant improvements without incurring additional costs. L2B offers several benefits over the baseline methods. It yields more robust models that are less…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
- The authors perform extensive experimentation on synthetic datasets, one dataset with real label noise, and compare to a large number of methods. - The convergence analysis is a great contribution to the paper. - The paper is well-written. Before getting to the experiments section, I was curious about whether or not the means of simplex projection for \alpha and \beta mattered in the method, since the authors perform an approximation. The authors seem to have pre-empted this potential questi
While the authors evaluate on one dataset with real label noise, Clothing-1M, there are a variety of other real-world settings where coping with (potentially systematic) label noise is important. It would be great to include additional real-world settings that may have systematic label noise. For example, label noise is a problem that is often inherent to the family of programmatic weak supervision techniques.
1. This paper is generally well-organized and easy to follow. 2. The idea of separately learning loss weights for the true-label term and the pseudo-label term is rational and logical for tasks involving learning with noisy labels. 3. The results shown on three datasets validate the effectiveness of the proposed method.
1. The motivation clarification of L2B could benefit from further improvement. While the paper states, "In contrast to prior works that individually reweight labels or instances, our paper introduces a novel approach to concurrently adjust both, elegantly unified under a meta-learning framework. We term our method as Learning to Bootstrap (L2B), as our goal is to enable the network to self-boost its capabilities by harnessing its own predictions in combating label noise," a clearer, high-level o
- Give a more general loss formulation for bootstrapping loss. - Learning the new loss in a meta-learning manner, and achieve a better performance compared with original bootstrapping loss and some sample weighting methods using meta learning.
- The idea is not novel. To improve the bootstrapping loss, [A] proposes dynamic hard and soft bootstrapping losses by individually weighting each samples. The sample-wise weights mean that the sample whether or not belongs to clean labels. The clean samples rely on their ground-truth labels, while noisy ones let their loss being dominated by their class predict. To determine the weights, [A] use two-component Beta Mixture Model (BMM), and [B] use two-component GMM to fit the max-normalized loss
1. The paper is well written. 2. This idea is very clear, simple and effective. 3. The proposed method has very good performance. 4. The proposed method is easy to reproduce.
1. The contributions are not very clear. It is better to summarize the contribution. 2. It is not clear why \alpha+\beta \neq 1 can significantly boost the model performance? 3. There is no latest comparative methods, like the methods in Tables 3-4, the latest one is from 2021. It is better to add more latest comparative methods. 4. Some sentences are too long and different to understand. For example, "we propose a novel machine learning method called Learning to Bootstrap (L2B) that lever-
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Infrastructure Maintenance and Monitoring · Anomaly Detection Techniques and Applications
