PEM: Representing Binary Program Semantics for Similarity Analysis via a Probabilistic Execution Model
Xiangzhe Xu, Zhou Xuan, Shiwei Feng, Siyuan Cheng, Yapeng Ye, Qingkai, Shi, Guanhong Tao, Le Yu, Zhuo Zhang, and Xiangyu Zhang

TL;DR
This paper introduces PEM, a probabilistic execution model that captures binary program semantics more effectively for similarity analysis, outperforming existing methods by 10-20% in precision.
Contribution
PEM presents a novel probabilistic execution engine that samples program semantics in a comparable way across binaries, enhancing similarity analysis accuracy.
Findings
PEM achieves 96% precision on real-world datasets.
PEM outperforms six state-of-the-art techniques by 10-20%.
Effective sampling of input and path spaces improves semantic representation.
Abstract
Binary similarity analysis determines if two binary executables are from the same source program. Existing techniques leverage static and dynamic program features and may utilize advanced Deep Learning techniques. Although they have demonstrated great potential, the community believes that a more effective representation of program semantics can further improve similarity analysis. In this paper, we propose a new method to represent binary program semantics. It is based on a novel probabilistic execution engine that can effectively sample the input space and the program path space of subject binaries. More importantly, it ensures that the collected samples are comparable across binaries, addressing the substantial variations of input specifications. Our evaluation on 9 real-world projects with 35k functions, and comparison with 6 state-of-the-art techniques show that PEM can achieve a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Software Engineering Research · Software Testing and Debugging Techniques
