Using Distributed Representation of Code for Bug Detection
J\'on Arnar Briem, Jordi Smit, Hendrig Sellik, Pavel Rapoport

TL;DR
This paper evaluates the effectiveness of neural code embeddings, specifically Code2Vec, in detecting off-by-one errors in Java code, demonstrating its potential beyond method name prediction.
Contribution
It empirically tests Code2Vec for bug detection, showing that attention-based structural code representations can identify bugs like off-by-one errors.
Findings
Model successfully detects off-by-one errors in Java code.
Structural code embeddings outperform baseline methods.
Training on mutated code improves bug detection accuracy.
Abstract
Recent advances in neural modeling for bug detection have been very promising. More specifically, using snippets of code to create continuous vectors or \textit{embeddings} has been shown to be very good at method name prediction and claimed to be efficient at other tasks, such as bug detection. However, to this end, the method has not been empirically tested for the latter. In this work, we use the Code2Vec model of Alon et al. to evaluate it for detecting off-by-one errors in Java source code. We define bug detection as a binary classification problem and train our model on a large Java file corpus containing likely correct code. In order to properly classify incorrect code, the model needs to be trained on false examples as well. To achieve this, we create likely incorrect code by making simple mutations to the original corpus. Our quantitative and qualitative evaluations show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Advanced Malware Detection Techniques
