SHIELD: Thwarting Code Authorship Attribution
Mohammed Abuhamad, Changhun Jung, David Mohaisen, DaeHun, Nyang

TL;DR
This paper introduces SHIELD, a framework for testing the robustness of code authorship attribution methods against adversarial attacks, revealing significant vulnerabilities in current techniques.
Contribution
It defines four attack strategies and demonstrates their effectiveness against six state-of-the-art authorship attribution methods using real-world data.
Findings
Non-targeted attacks succeed with over 98.5% success rate.
Targeted attacks can impersonate authors with 66-88% success.
Current attribution methods are highly vulnerable to adversarial perturbations.
Abstract
Authorship attribution has become increasingly accurate, posing a serious privacy risk for programmers who wish to remain anonymous. In this paper, we introduce SHIELD to examine the robustness of different code authorship attribution approaches against adversarial code examples. We define four attacks on attribution techniques, which include targeted and non-targeted attacks, and realize them using adversarial code perturbation. We experiment with a dataset of 200 programmers from the Google Code Jam competition to validate our methods targeting six state-of-the-art authorship attribution methods that adopt a variety of techniques for extracting authorship traits from source-code, including RNN, CNN, and code stylometry. Our experiments demonstrate the vulnerability of current authorship attribution methods against adversarial attacks. For the non-targeted attack, our experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Authorship Attribution and Profiling
