OCEAN: Open-World Contrastive Authorship Identification
Felix M\"achtle, Jan-Niclas Serr, Nils Loose, Jonas Sander, Thomas, Eisenbarth

TL;DR
OCEAN is a novel contrastive learning framework that accurately attributes code authorship in binary files within open-world scenarios, enhancing cybersecurity by detecting malicious code injections and improving attribution robustness.
Contribution
It introduces the first open-world, binary-level authorship attribution framework using contrastive learning, along with new datasets for realistic evaluation and demonstrating superior performance over existing methods.
Findings
Achieved an AUROC score of 0.86 on unseen datasets.
Improved authorship attribution accuracy by 7% over previous datasets.
Outperformed state-of-the-art methods by 10% in source code analysis.
Abstract
In an era where cyberattacks increasingly target the software supply chain, the ability to accurately attribute code authorship in binary files is critical to improving cybersecurity measures. We propose OCEAN, a contrastive learning-based system for function-level authorship attribution. OCEAN is the first framework to explore code authorship attribution on compiled binaries in an open-world and extreme scenario, where two code samples from unknown authors are compared to determine if they are developed by the same author. To evaluate OCEAN, we introduce new realistic datasets: CONAN, to improve the performance of authorship attribution systems in real-world use cases, and SNOOPY, to increase the robustness of the evaluation of such systems. We use CONAN to train our model and evaluate on SNOOPY, a fully unseen dataset, resulting in an AUROC score of 0.86 even when using high compiler…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Names, Identity, and Discrimination Research
