Modeling Functional Similarity in Source Code with Graph-Based Siamese   Networks

Nikita Mehrotra; Navdha Agarwal; Piyush Gupta; Saket Anand; David Lo,; and Rahul Purandare

arXiv:2011.11228·cs.SE·November 26, 2020

Modeling Functional Similarity in Source Code with Graph-Based Siamese Networks

Nikita Mehrotra, Navdha Agarwal, Piyush Gupta, Saket Anand, David Lo,, and Rahul Purandare

PDF

TL;DR

This paper introduces HOLMES, a novel deep learning approach using program dependency graphs and geometric neural networks for semantic code clone detection, outperforming existing tools in accuracy and generalizability.

Contribution

The paper presents a new method leveraging program dependency graphs and geometric neural networks for semantic clone detection, with a prototype tool that outperforms state-of-the-art approaches.

Findings

01

HOLMES outperforms TBCCD on benchmark datasets.

02

HOLMES generalizes well to unseen projects.

03

HOLMES detects semantic clones more effectively.

Abstract

Code clones are duplicate code fragments that share (nearly) similar syntax or semantics. Code clone detection plays an important role in software maintenance, code refactoring, and reuse. A substantial amount of research has been conducted in the past to detect clones. A majority of these approaches use lexical and syntactic information to detect clones. However, only a few of them target semantic clones. Recently, motivated by the success of deep learning models in other fields, including natural language processing and computer vision, researchers have attempted to adopt deep learning techniques to detect code clones. These approaches use lexical information (tokens) and(or) syntactic structures like abstract syntax trees (ASTs) to detect code clones. However, they do not make sufficient use of the available structural and semantic information hence, limiting their capabilities.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.