Integrated Reasoning Engine for Pointer-related Code Clone Detection

Hongfa Xue; Yongsheng Mei; Kailash Gogineni; Guru Venkataramani; Tian; Lan

arXiv:2105.11933·cs.SE·May 26, 2021

Integrated Reasoning Engine for Pointer-related Code Clone Detection

Hongfa Xue, Yongsheng Mei, Kailash Gogineni, Guru Venkataramani, Tian, Lan

PDF

Open Access

TL;DR

Twin-Finder+ is a novel approach combining machine learning and symbolic execution to accurately detect pointer-related code clones, significantly reducing false positives and uncovering previously unreported bugs.

Contribution

It introduces a formal verification mechanism into clone detection, enhancing precision and automating manual review processes.

Findings

01

Removes 91.69% false positives on average

02

Detects 6 unreported bugs in real-world applications

03

Identifies a patched bug in LibreOffice

Abstract

Detecting similar code fragments, usually referred to as code clones, is an important task. In particular, code clone detection can have significant uses in the context of vulnerability discovery, refactoring and plagiarism detection. However, false positives are inevitable and always require manual reviews. In this paper, we propose Twin-Finder+, a novel closed-loop approach for pointer-related code clone detection that integrates machine learning and symbolic execution techniques to achieve precision. Twin-Finder+ introduces a formal verification mechanism to automate such manual reviews process. Our experimental results show Twin-Finder+ that can remove 91.69% false positives in average. We further conduct security analysis for memory safety using real-world applications, Links version 2.14 and libreOffice-6.0.0.1. Twin-Finder+ is able to find 6 unreported bugs in Links version 2.14…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Security and Verification in Computing