On-the-fly Improving Performance of Deep Code Models via Input Denoising
Zhao Tian, Junjie Chen, Xiangyu Zhang

TL;DR
This paper introduces CodeDenoise, a novel on-the-fly input denoising technique for deep code models that localizes and cleans noisy identifiers, significantly improving accuracy without retraining.
Contribution
It presents the first input denoising method for deep code models that enhances performance on deployed models without retraining or fine-tuning.
Findings
CodeDenoise denoises 21.91% of mispredicted inputs on average.
It improves model accuracy by 2.04% across multiple datasets.
The technique operates efficiently, averaging 0.48 seconds per input.
Abstract
Deep learning has been widely adopted to tackle various code-based tasks by building deep code models based on a large amount of code snippets. While these deep code models have achieved great success, even state-of-the-art models suffer from noise present in inputs leading to erroneous predictions. While it is possible to enhance models through retraining/fine-tuning, this is not a once-and-for-all approach and incurs significant overhead. In particular, these techniques cannot on-the-fly improve performance of (deployed) models. There are currently some techniques for input denoising in other domains (such as image processing), but since code input is discrete and must strictly abide by complex syntactic and semantic constraints, input denoising techniques in other fields are almost not applicable. In this work, we propose the first input denoising technique (i.e., CodeDenoise) for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Software Engineering Research · Anomaly Detection Techniques and Applications
