An Empirical Study on Noisy Label Learning for Program Understanding
Wenhan Wang, Yanzhou Li, Anran Li, Jian Zhang, Wei Ma, Yang Liu

TL;DR
This paper empirically evaluates noisy label learning methods in deep learning for program understanding, revealing that large pre-trained models are robust to noise and that NLL improves small models' accuracy but struggles with real-world noise.
Contribution
It provides a comprehensive empirical analysis of NLL approaches on multiple program understanding tasks, highlighting their strengths and limitations for different model sizes.
Findings
Large pre-trained models are robust against label noise.
NLL approaches improve small models' accuracy on noisy data.
NLL effectively detects synthetic noise but struggles with real-world noise.
Abstract
Recently, deep learning models have been widely applied in program understanding tasks, and these models achieve state-of-the-art results on many benchmark datasets. A major challenge of deep learning for program understanding is that the effectiveness of these approaches depends on the quality of their datasets, and these datasets often contain noisy data samples. A typical kind of noise in program understanding datasets is label noise, which means that the target outputs for some inputs are incorrect. Researchers have proposed various approaches to alleviate the negative impact of noisy labels, and formed a new research topic: noisy label learning (NLL). In this paper, we conduct an empirical study on the effectiveness of noisy label learning on deep learning for program understanding datasets. We evaluate various NLL approaches and deep learning models on three tasks: program…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning
