Understand Code Style: Efficient CNN-based Compiler Optimization Recognition System
Shouguo Yang, Zhiqiang Shi, Guodong Zhang, Mingxuan Li, Yuan Ma, Limin, Sun

TL;DR
This paper introduces BinEye, a CNN-based system for recognizing compiler optimization levels in binary files, achieving high accuracy and interpretability, aiding binary analysis and security research.
Contribution
The paper presents a novel CNN-based model, BinEye, for automatic recognition of compiler optimization levels in binary files, with improved speed and interpretability.
Findings
Achieves over 97% accuracy in recognition.
At least 8 times faster than RNN-based models.
Provides insights into code differences caused by optimizations.
Abstract
Compiler optimization level recognition can be applied to vulnerability discovery and binary analysis. Due to the exists of many different compilation optimization options, the difference in the contents of the binary file is very complicated. There are thousands of compiler optimization algorithms and multiple different processor architectures, so it is very difficult to manually analyze binary files and recognize its compiler optimization level with rules. This paper first proposes a CNN-based compiler optimization level recognition model: BinEye. The system extracts semantic and structural differences and automatically recognize the compiler optimization levels. The model is designed to be very suitable for binary file processing and is easy to understand. We built a dataset containing 80,028 binary files for the model training and testing. Our proposed model achieves an accuracy of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Security and Verification in Computing · Software Engineering Research
