MISIM: A Neural Code Semantics Similarity System Using the Context-Aware Semantics Structure
Fangke Ye, Shengtian Zhou, Anand Venkat, Ryan Marcus, Nesime Tatbul,, Jesmin Jahan Tithi, Niranjan Hasabnis, Paul Petersen, Timothy Mattson, Tim, Kraska, Pradeep Dubey, Vivek Sarkar, Justin Gottschlich

TL;DR
MISIM is a neural system that measures code semantics similarity more accurately by leveraging a novel context-aware structure and an extensible scoring algorithm, outperforming existing methods on large-scale code datasets.
Contribution
Introduction of MISIM, a neural code similarity system with a novel context-aware semantics structure and an adaptable scoring algorithm, improving accuracy over state-of-the-art methods.
Findings
MISIM achieves 8.08% higher accuracy than previous systems.
MISIM effectively lifts semantics from code syntax.
The system performs well on large-scale code datasets.
Abstract
Code semantics similarity can be used for many tasks such as code recommendation, automated software defect correction, and clone detection. Yet, the accuracy of such systems has not yet reached a level of general purpose reliability. To help address this, we present Machine Inferred Code Similarity (MISIM), a neural code semantics similarity system consisting of two core components: (i)MISIM uses a novel context-aware semantics structure, which was purpose-built to lift semantics from code syntax; (ii)MISIM uses an extensible neural code similarity scoring algorithm, which can be used for various neural network architectures with learned parameters. We compare MISIM to four state-of-the-art systems, including two additional hand-customized models, over 328K programs consisting of over 18 million lines of code. Our experiments show that MISIM has 8.08% better accuracy (using MAP@R)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Web Data Mining and Analysis
