Bug Prediction Using Source Code Embedding Based on Doc2Vec

Tam\'as Aladics; Judit J\'asz; Rudolf Ferenc

arXiv:2110.04951·cs.SE·October 12, 2021·1 cites

Bug Prediction Using Source Code Embedding Based on Doc2Vec

Tam\'as Aladics, Judit J\'asz, Rudolf Ferenc

PDF

Open Access

TL;DR

This paper introduces a source code embedding method based on Doc2Vec and ASTs, demonstrating improved bug prediction accuracy over traditional code metrics across various machine learning models.

Contribution

It presents a novel source code representation using AST-based Doc2Vec embeddings for bug prediction, outperforming metric-based features.

Findings

01

Embedding improves bug prediction accuracy in most cases

02

Embedding is at least as effective as code metrics alone

03

Various machine learning models benefit from the embedding

Abstract

Bug prediction is a resource demanding task that is hard to automate using static source code analysis. In many fields of computer science, machine learning has proven to be extremely useful in tasks like this, however, for it to work we need a way to use source code as input. We propose a simple, but meaningful representation for source code based on its abstract syntax tree and the Doc2Vec embedding algorithm. This representation maps the source code to a fixed length vector which can be used for various upstream tasks -- one of which is bug prediction. We measured this approach's validity by itself and its effectiveness compared to bug prediction based solely on code metrics. We also experimented on numerous machine learning approaches to check the connection between different embedding parameters with different machine learning models. Our results show that this representation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Advanced Malware Detection Techniques