Robust and Accurate Authorship Attribution via Program Normalization

Yizhen Wang; Mohannad Alhanahnah; Ke Wang; Mihai Christodorescu,; Somesh Jha

arXiv:2007.00772·cs.LG·March 1, 2022

Robust and Accurate Authorship Attribution via Program Normalization

Yizhen Wang, Mohannad Alhanahnah, Ke Wang, Mihai Christodorescu,, Somesh Jha

PDF

Open Access

TL;DR

This paper introduces a normalization-based framework that significantly enhances the robustness and accuracy of source code authorship attribution against adversarial attacks, addressing security vulnerabilities in deep learning approaches.

Contribution

The paper proposes the normalize-and-predict framework, providing theoretical robustness guarantees and demonstrating substantial improvements over existing methods in defending against adversarial attacks.

Findings

01

Improves accuracy on adversarial inputs by up to 70%.

02

Increases robust accuracy by 45% over adversarial training.

03

Runs over 40 times faster than existing robust training methods.

Abstract

Source code attribution approaches have achieved remarkable accuracy thanks to the rapid advances in deep learning. However, recent studies shed light on their vulnerability to adversarial attacks. In particular, they can be easily deceived by adversaries who attempt to either create a forgery of another author or to mask the original author. To address these emerging issues, we formulate this security challenge into a general threat model, the $relational adversary$ , that allows an arbitrary number of the semantics-preserving transformations to be applied to an input in any problem space. Our theoretical investigation shows the conditions for robustness and the trade-off between robustness and accuracy in depth. Motivated by these insights, we present a novel learning framework, $normalize-and-predict$ ( $\textit{N&P}$ ), that in theory guarantees the robustness of any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Misinformation and Its Impacts · Topic Modeling