Using Paragraph Vectors to improve our existing code review assisting   tool-CRUSO

Ritu Kapur; Balwinder Sodhi; Poojith U Rao; and Shipra Sharma

arXiv:2104.14265·cs.SE·April 30, 2021

Using Paragraph Vectors to improve our existing code review assisting tool-CRUSO

Ritu Kapur, Balwinder Sodhi, Poojith U Rao, and Shipra Sharma

PDF

Open Access

TL;DR

This paper enhances a code review tool by leveraging Paragraph Vectors trained on StackOverflow data to accurately and efficiently predict code defectiveness across multiple programming languages.

Contribution

The paper introduces SOpostsDB and CRUSO-P, novel datasets and systems that significantly improve speed, memory efficiency, and accuracy in code defectiveness prediction.

Findings

01

CRUSO-P reduces response time by 97.82%.

02

CRUSO-P decreases storage needs by 99.15%.

03

CRUSO-P achieves 99.6% accuracy on C code.

Abstract

Code reviews are one of the effective methods to estimate defectiveness in source code. However, the existing methods are dependent on experts or inefficient. In this paper, we improve the performance (in terms of speed and memory usage) of our existing code review assisting tool--CRUSO. The central idea of the approach is to estimate the defectiveness for an input source code by using the defectiveness score of similar code fragments present in various StackOverflow (SO) posts. The significant contributions of our paper are i) SOpostsDB: a dataset containing the PVA vectors and the SO posts information, ii) CRUSO-P: a code review assisting system based on PVA models trained on \emph{SOpostsDB}. For a given input source code, CRUSO-P labels it as {Likely to be defective, Unlikely to be defective, Unpredictable}. To develop CRUSO-P, we processed >3 million SO posts and 188200+ GitHub…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Topic Modeling