Natural Attack for Pre-trained Models of Code

Zhou Yang; Jieke Shi; Junda He; David Lo

arXiv:2201.08698·cs.SE·March 1, 2022

Natural Attack for Pre-trained Models of Code

Zhou Yang, Jieke Shi, Junda He, David Lo

PDF

1 Repo 6 Models

TL;DR

This paper introduces ALERT, a black-box adversarial attack method for pre-trained code models that generates natural, semantically consistent adversarial examples, improving attack success rates and aiding model robustness.

Contribution

ALERT is the first attack considering naturalness in adversarial code example generation, enhancing realism and effectiveness over prior methods.

Findings

01

ALERT achieves high attack success rates on CodeBERT and GraphCodeBERT.

02

Human study shows ALERT's adversarial examples are more natural than previous methods.

03

Adversarial fine-tuning with ALERT examples significantly improves model robustness.

Abstract

Pre-trained models of code have achieved success in many important software engineering tasks. However, these powerful models are vulnerable to adversarial attacks that slightly perturb model inputs to make a victim model produce wrong outputs. Current works mainly attack models of code with examples that preserve operational program semantics but ignore a fundamental requirement for adversarial example generation: perturbations should be natural to human judges, which we refer to as naturalness requirement. In this paper, we propose ALERT (nAturaLnEss AwaRe ATtack), a black-box attack that adversarially transforms inputs to make victim models produce wrong outputs. Different from prior works, this paper considers the natural semantic of generated examples at the same time as preserving the operational semantic of original inputs. Our user study demonstrates that human developers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

soarsmu/attack-pretrain-models-of-code
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.