ROMEO: Exploring Juliet through the Lens of Assembly Language
Clemens-Alexander Brust, Tim Sonnekalb, Bernd Gruner

TL;DR
This paper introduces ROMEO, a new assembly language-based vulnerability detection benchmark dataset, demonstrating that assembly analysis with context improves detection accuracy and is comparable to source code methods.
Contribution
It presents ROMEO, a publicly available benchmark dataset and a simple assembly language representation with context, advancing vulnerability detection directly on binary code.
Findings
Assembly language analysis with context improves vulnerability detection.
ROMEO benchmark is comparable to source code-based methods.
No label information leakage during compilation.
Abstract
Automatic vulnerability detection on C/C++ source code has benefitted from the introduction of machine learning to the field, with many recent publications targeting this combination. In contrast, assembly language or machine code artifacts receive less attention, although there are compelling reasons to study them. They are more representative of what is executed, more easily incorporated in dynamic analysis, and in the case of closed-source code, there is no alternative. We evaluate the representative capability of assembly language compared to C/C++ source code for vulnerability detection. Furthermore, we investigate the role of call graph context in detecting function-spanning vulnerabilities. Finally, we verify whether compiling a benchmark dataset compromises an experiment's soundness by inadvertently leaking label information. We propose ROMEO, a publicly available,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Security and Verification in Computing · Software Engineering Research
