Transfer Attacks and Defenses for Large Language Models on Coding Tasks
Chi Zhang, Zifan Wang, Ravi Mangal, Matt Fredrikson, Limin Jia, Corina, Pasareanu

TL;DR
This paper investigates how adversarial perturbations affect large language models in coding tasks, demonstrating transferability of attacks and proposing prompt-based defenses to enhance robustness without retraining.
Contribution
It studies transferability of adversarial examples from smaller models to LLMs and introduces prompt-based defense strategies to improve robustness against such attacks.
Findings
Adversarial examples transfer from smaller models to LLMs, reducing their performance.
Prompt-based defenses can mitigate the impact of adversarial perturbations.
Proposed defenses improve LLM robustness without retraining.
Abstract
Modern large language models (LLMs), such as ChatGPT, have demonstrated impressive capabilities for coding tasks including writing and reasoning about code. They improve upon previous neural network models of code, such as code2seq or seq2seq, that already demonstrated competitive results when performing tasks such as code summarization and identifying code vulnerabilities. However, these previous code models were shown vulnerable to adversarial examples, i.e. small syntactic perturbations that do not change the program's semantics, such as the inclusion of "dead code" through false conditions or the addition of inconsequential print statements, designed to "fool" the models. LLMs can also be vulnerable to the same adversarial perturbations but a detailed study on this concern has been lacking so far. In this paper we aim to investigate the effect of adversarial perturbations on coding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Ferroelectric and Negative Capacitance Devices · Topic Modeling
