TL;DR
This paper demonstrates that neural models of code are vulnerable to adversarial attacks using small, semantics-preserving perturbations, and introduces DAMP, a novel technique for generating such adversarial examples across multiple architectures.
Contribution
The paper presents DAMP, a new method for creating targeted adversarial examples for code models, and evaluates defenses to mitigate these attacks.
Findings
DAMP achieves up to 89% success in targeted attacks.
DAMP achieves up to 94% success in non-targeted attacks.
Some defenses significantly reduce attack success with minimal accuracy loss.
Abstract
Neural models of code have shown impressive results when performing tasks such as predicting method names and identifying certain kinds of bugs. We show that these models are vulnerable to adversarial examples, and introduce a novel approach for attacking trained models of code using adversarial examples. The main idea of our approach is to force a given trained model to make an incorrect prediction, as specified by the adversary, by introducing small perturbations that do not change the program's semantics, thereby creating an adversarial example. To find such perturbations, we present a new technique for Discrete Adversarial Manipulation of Programs (DAMP). DAMP works by deriving the desired prediction with respect to the model's inputs, while holding the model weights constant, and following the gradients to slightly modify the input code. We show that our DAMP attack is effective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGated Graph Sequence Neural Networks
