RoPGen: Towards Robust Code Authorship Attribution via Automatic Coding   Style Transformation

Zhen Li; Guenevere (Qian) Chen; Chen Chen; Yayi Zou; Shouhuai Xu

arXiv:2202.06043·cs.CR·February 15, 2022

RoPGen: Towards Robust Code Authorship Attribution via Automatic Coding Style Transformation

Zhen Li, Guenevere (Qian) Chen, Chen Chen, Yayi Zou, Shouhuai Xu

PDF

1 Repo

TL;DR

This paper introduces RoPGen, a novel framework that enhances the robustness of deep learning-based code authorship attribution by learning unique coding style patterns resistant to manipulation and adversarial attacks.

Contribution

The paper proposes RoPGen, a new method combining data and gradient augmentation during adversarial training to improve robustness against style manipulation attacks.

Findings

01

RoPGen significantly reduces attack success rates by up to 41%.

02

It improves robustness across C, C++, and Java datasets.

03

The approach effectively learns diversified coding style representations.

Abstract

Source code authorship attribution is an important problem often encountered in applications such as software forensics, bug fixing, and software quality analysis. Recent studies show that current source code authorship attribution methods can be compromised by attackers exploiting adversarial examples and coding style manipulation. This calls for robust solutions to the problem of code authorship attribution. In this paper, we initiate the study on making Deep Learning (DL)-based code authorship attribution robust. We propose an innovative framework called Robust coding style Patterns Generation (RoPGen), which essentially learns authors' unique coding style patterns that are hard for attackers to manipulate or imitate. The key idea is to combine data augmentation and gradient augmentation at the adversarial training phase. This effectively increases the diversity of training examples,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ropgen/ropgen
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.