Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity
Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, Baishakhi Ray

TL;DR
Trex is a transfer-learning framework that learns execution semantics from micro-traces to improve binary function similarity detection across different architectures and obfuscations, outperforming existing methods.
Contribution
The paper introduces Trex, a novel transfer-learning approach that explicitly learns execution semantics from micro-traces for binary function matching, with a new neural architecture and no manual labeling.
Findings
Trex outperforms state-of-the-art systems by 7.8%, 7.2%, and 14.3% in cross-architecture, optimization, and obfuscation matching.
Pretraining significantly improves function matching performance.
The approach effectively handles diverse architectures, compiler optimizations, and obfuscations.
Abstract
Detecting semantically similar functions -- a crucial analysis capability with broad real-world security usages including vulnerability detection, malware lineage, and forensics -- requires understanding function behaviors and intentions. This task is challenging as semantically similar functions can be implemented differently, run on different architectures, and compiled with diverse compiler optimizations or obfuscations. Most existing approaches match functions based on syntactic features without understanding the functions' execution semantics. We present Trex, a transfer-learning-based framework, to automate learning execution semantics explicitly from functions' micro-traces and transfer the learned knowledge to match semantically similar functions. Our key insight is that these traces can be used to teach an ML model the execution semantics of different sequences of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Digital and Cyber Forensics
