Adversarial Attack Classification and Robustness Testing for Large Language Models for Code
Yang Liu, Armstrong Foundjem, Foutse Khomh, and Heng Li

TL;DR
This paper investigates how adversarial perturbations in natural language inputs affect large language models for code, revealing vulnerabilities at different linguistic levels and proposing a framework for robustness testing.
Contribution
It introduces a structured taxonomy and mixed-methods framework for evaluating LLM4Code robustness against natural language adversarial attacks.
Findings
Word-level perturbations significantly challenge model robustness.
Sentence-level attacks are less effective, indicating some resilience.
Model sensitivity varies across character, word, and sentence perturbations.
Abstract
Large Language Models (LLMs) have become vital tools in software development tasks such as code generation, completion, and analysis. As their integration into workflows deepens, ensuring robustness against vulnerabilities especially those triggered by diverse or adversarial inputs becomes increasingly important. Such vulnerabilities may lead to incorrect or insecure code generation when models encounter perturbed task descriptions, code, or comments. Prior research often overlooks the role of natural language in guiding code tasks. This study investigates how adversarial perturbations in natural language inputs including prompts, comments, and descriptions affect LLMs for Code (LLM4Code). It examines the effects of perturbations at the character, word, and sentence levels to identify the most impactful vulnerabilities. We analyzed multiple projects (e.g., ReCode, OpenAttack) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
