Pluvio: Assembly Clone Search for Out-of-domain Architectures and Libraries through Transfer Learning and Conditional Variational Information Bottleneck
Zhiwei Fu, Steven H. H. Ding, Furkan Alaca, Benjamin C. M. Fung,, Philippe Charland

TL;DR
Pluvio introduces a transfer learning-based assembly clone search method that generalizes to unseen architectures and libraries, utilizing large-scale pre-trained models, reinforcement learning for sequence trimming, and a Variational Information Bottleneck for better generalization.
Contribution
This work is the first to address assembly clone search for unseen architectures using transfer learning, reinforcement learning, and a novel Variational Information Bottleneck strategy.
Findings
Outperforms state-of-the-art methods in unseen architecture scenarios
Effective in reducing reliance on architecture-specific indicators
Demonstrates strong generalization capabilities across diverse architectures
Abstract
The practice of code reuse is crucial in software development for a faster and more efficient development lifecycle. In reality, however, code reuse practices lack proper control, resulting in issues such as vulnerability propagation and intellectual property infringements. Assembly clone search, a critical shift-right defence mechanism, has been effective in identifying vulnerable code resulting from reuse in released executables. Recent studies on assembly clone search demonstrate a trend towards using machine learning-based methods to match assembly code variants produced by different toolchains. However, these methods are limited to what they learn from a small number of toolchain variants used in training, rendering them inapplicable to unseen architectures and their corresponding compilation toolchain variants. This paper presents the first study on the problem of assembly clone…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Open Source Software Innovations
