Bin2Vec: Interpretable and Auditable Multi-View Binary Analysis for Code Plagiarism Detection
Moussa Moussaoui, Tarik Houichime, Abdelalim Sadiq

TL;DR
Bin2Vec is a multi-view binary analysis framework that combines structural and behavioral features of software to improve interpretability and accuracy in code similarity detection, aiding cybersecurity and reverse-engineering.
Contribution
It introduces a modular, multi-view approach that integrates static and dynamic features into an explainable similarity measure for binary programs.
Findings
Effective in distinguishing different versions of Windows programs
Provides interpretable visualizations of program similarities
Enhances reliability and explainability of binary analysis
Abstract
We introduce Bin2Vec, a new framework that helps compare software programs in a clear and explainable way. Instead of focusing only on one type of information, Bin2Vec combines what a program looks like (its built-in functions, imports, and exports) with how it behaves when it runs (its instructions and memory usage). This gives a more complete picture when deciding whether two programs are similar or not. Bin2Vec represents these different types of information as views that can be inspected separately using easy-to-read charts, and then brings them together into an overall similarity score. Bin2Vec acts as a bridge between binary representations and machine learning techniques by generating feature representations that can be efficiently processed by machine-learning models. We tested Bin2Vec on multiple versions of two well-known Windows programs, PuTTY and 7-Zip. The primary results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Academic integrity and plagiarism
