Modeling Functional Similarity in Source Code with Graph-Based Siamese Networks
Nikita Mehrotra, Navdha Agarwal, Piyush Gupta, Saket Anand, David Lo,, and Rahul Purandare

TL;DR
This paper introduces HOLMES, a novel deep learning approach using program dependency graphs and geometric neural networks for semantic code clone detection, outperforming existing tools in accuracy and generalizability.
Contribution
The paper presents a new method leveraging program dependency graphs and geometric neural networks for semantic clone detection, with a prototype tool that outperforms state-of-the-art approaches.
Findings
HOLMES outperforms TBCCD on benchmark datasets.
HOLMES generalizes well to unseen projects.
HOLMES detects semantic clones more effectively.
Abstract
Code clones are duplicate code fragments that share (nearly) similar syntax or semantics. Code clone detection plays an important role in software maintenance, code refactoring, and reuse. A substantial amount of research has been conducted in the past to detect clones. A majority of these approaches use lexical and syntactic information to detect clones. However, only a few of them target semantic clones. Recently, motivated by the success of deep learning models in other fields, including natural language processing and computer vision, researchers have attempted to adopt deep learning techniques to detect code clones. These approaches use lexical information (tokens) and(or) syntactic structures like abstract syntax trees (ASTs) to detect code clones. However, they do not make sufficient use of the available structural and semantic information hence, limiting their capabilities.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
