An Empirical Study on Noisy Label Learning for Program Understanding

Wenhan Wang; Yanzhou Li; Anran Li; Jian Zhang; Wei Ma; Yang Liu

arXiv:2307.08990·cs.SE·January 2, 2024

An Empirical Study on Noisy Label Learning for Program Understanding

Wenhan Wang, Yanzhou Li, Anran Li, Jian Zhang, Wei Ma, Yang Liu

PDF

Open Access 1 Repo

TL;DR

This paper empirically evaluates noisy label learning methods in deep learning for program understanding, revealing that large pre-trained models are robust to noise and that NLL improves small models' accuracy but struggles with real-world noise.

Contribution

It provides a comprehensive empirical analysis of NLL approaches on multiple program understanding tasks, highlighting their strengths and limitations for different model sizes.

Findings

01

Large pre-trained models are robust against label noise.

02

NLL approaches improve small models' accuracy on noisy data.

03

NLL effectively detects synthetic noise but struggles with real-world noise.

Abstract

Recently, deep learning models have been widely applied in program understanding tasks, and these models achieve state-of-the-art results on many benchmark datasets. A major challenge of deep learning for program understanding is that the effectiveness of these approaches depends on the quality of their datasets, and these datasets often contain noisy data samples. A typical kind of noise in program understanding datasets is label noise, which means that the target outputs for some inputs are incorrect. Researchers have proposed various approaches to alleviate the negative impact of noisy labels, and formed a new research topic: noisy label learning (NLL). In this paper, we conduct an empirical study on the effectiveness of noisy label learning on deep learning for program understanding datasets. We evaluate various NLL approaches and deep learning models on three tasks: program…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jacobwwh/noise_se
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning