TL;DR
This paper explores the vulnerability of neural models for source code to backdoor attacks, demonstrating how they can be inserted, detected, and eliminated across various architectures and programming languages.
Contribution
It defines backdoor classes for source code models, adapts spectral detection algorithms, and provides a comprehensive evaluation of backdoor injection and removal methods.
Findings
Backdoors can be easily injected into source code models.
Spectral signatures enable detection of poisoned data.
Backdoors can be effectively eliminated across architectures and languages.
Abstract
Deep neural networks are vulnerable to a range of adversaries. A particularly pernicious class of vulnerabilities are backdoors, where model predictions diverge in the presence of subtle triggers in inputs. An attacker can implant a backdoor by poisoning the training data to yield a desired target prediction on triggered inputs. We study backdoors in the context of deep-learning for source code. (1) We define a range of backdoor classes for source-code tasks and show how to poison a dataset to install such backdoors. (2) We adapt and improve recent algorithms from robust statistics for our setting, showing that backdoors leave a spectral signature in the learned representation of source code, thus enabling detection of poisoned data. (3) We conduct a thorough evaluation on different architectures and languages, showing the ease of injecting backdoors and our ability to eliminate them.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
