Misleading Authorship Attribution of Source Code using Adversarial Learning
Erwin Quiring, Alwin Maier, Konrad Rieck

TL;DR
This paper introduces an adversarial attack on source code authorship attribution methods that significantly reduces their accuracy by using semantics-preserving code transformations guided by Monte-Carlo tree search.
Contribution
We propose a novel adversarial attack leveraging Monte-Carlo tree search to deceive machine learning-based authorship attribution of source code, exposing vulnerabilities in current methods.
Findings
Attack reduces attribution accuracy from over 88% to 1%.
Attack can imitate developers' coding styles with high accuracy.
Current attribution methods are vulnerable and unsuitable for practical use.
Abstract
In this paper, we present a novel attack against authorship attribution of source code. We exploit that recent attribution methods rest on machine learning and thus can be deceived by adversarial examples of source code. Our attack performs a series of semantics-preserving code transformations that mislead learning-based attribution but appear plausible to a developer. The attack is guided by Monte-Carlo tree search that enables us to operate in the discrete domain of source code. In an empirical evaluation with source code from 204 programmers, we demonstrate that our attack has a substantial effect on two recent attribution methods, whose accuracy drops from over 88% to 1% under attack. Furthermore, we show that our attack can imitate the coding style of developers with high accuracy and thereby induce false attributions. We conclude that current approaches for authorship attribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Software Engineering Research
MethodsMonte-Carlo Tree Search
