Asteria: Deep Learning-based AST-Encoding for Cross-platform Binary Code Similarity Detection
Shouguo Yang, Long Cheng, Yicheng Zeng, Zhe Lang, Hongsong Zhu,, Zhiqiang Shi

TL;DR
Asteria introduces a deep learning approach using Tree-LSTM to encode ASTs for cross-platform binary code similarity detection, significantly improving accuracy and speed for security applications like vulnerability search.
Contribution
The paper presents a novel deep learning-based AST encoding method, Asteria, which effectively captures semantic equivalence across different architectures for binary similarity detection.
Findings
Outperforms existing tools Diaphora and Gemini in accuracy.
Achieves several orders of magnitude faster similarity computation.
Successfully identified 75 vulnerabilities in IoT firmware images.
Abstract
Binary code similarity detection is a fundamental technique for many security applications such as vulnerability search, patch analysis, and malware detection. There is an increasing need to detect similar code for vulnerability search across architectures with the increase of critical vulnerabilities in IoT devices. The variety of IoT hardware architectures and software platforms requires to capture semantic equivalence of code fragments in the similarity detection. However, existing approaches are insufficient in capturing the semantic similarity. We notice that the abstract syntax tree (AST) of a function contains rich semantic information. Inspired by successful applications of natural language processing technologies in sentence semantic understanding, we propose a deep learning-based AST-encoding method, named ASTERIA, to measure the semantic equivalence of functions in different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
