You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion

Roei Schuster; Congzheng Song; Eran Tromer; Vitaly Shmatikov

arXiv:2007.02220·cs.CR·October 12, 2020·23 cites

You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion

Roei Schuster, Congzheng Song, Eran Tromer, Vitaly Shmatikov

PDF

Open Access

TL;DR

Neural code autocompleters are vulnerable to poisoning attacks that can manipulate their suggestions, posing security risks, and current defenses are largely ineffective against such threats.

Contribution

This paper demonstrates the vulnerability of neural code autocompleters to poisoning attacks and evaluates the effectiveness of existing defenses.

Findings

01

Poisoning can cause autocompleters to suggest insecure code snippets.

02

Targeted attacks can influence suggestions for specific repositories or developers.

03

Existing defenses are largely ineffective against poisoning attacks.

Abstract

Code autocompletion is an integral feature of modern code editors and IDEs. The latest generation of autocompleters uses neural language models, trained on public open-source code repositories, to suggest likely (not just statically feasible) completions given the current context. We demonstrate that neural code autocompleters are vulnerable to poisoning attacks. By adding a few specially-crafted files to the autocompleter's training corpus (data poisoning), or else by directly fine-tuning the autocompleter on these files (model poisoning), the attacker can influence its suggestions for attacker-chosen contexts. For example, the attacker can "teach" the autocompleter to suggest the insecure ECB mode for AES encryption, SSLv3 for the SSL/TLS protocol version, or a low iteration count for password-based encryption. Moreover, we show that these attacks can be targeted: an autocompleter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Software Engineering Research

MethodsLinear Layer · Cosine Annealing · Discriminative Fine-Tuning · Dropout · Byte Pair Encoding · Multi-Head Attention · Residual Connection · Attention Is All You Need · Linear Warmup With Cosine Annealing · Attention Dropout