On the Adversarial Robustness of Instruction-Tuned Large Language Models   for Code

Md Imran Hossen; Xiali Hei

arXiv:2411.19508·cs.SE·December 2, 2024

On the Adversarial Robustness of Instruction-Tuned Large Language Models for Code

Md Imran Hossen, Xiali Hei

PDF

Open Access

TL;DR

This paper evaluates the robustness of instruction-tuned large language models for coding, revealing significant vulnerabilities to input perturbations and proposing a mitigation strategy to improve their resilience.

Contribution

Introduces DegradePrompter, a systematic evaluation method for assessing robustness of Code LLMs against input challenges, and compares open-source and commercial models.

Findings

01

Open-source models show 12-34% decline in correctness under perturbations.

02

Commercial models are more resilient, with 3-24% performance degradation.

03

A simple mitigation strategy can enhance model robustness.

Abstract

The advent of instruction-tuned Large Language Models designed for coding tasks (Code LLMs) has transformed software engineering practices. However, their robustness against various input challenges remains a critical concern. This study introduces DegradePrompter, a novel method designed to systematically evaluate the robustness of instruction-tuned Code LLMs. We assess the impact of diverse input challenges on the functionality and correctness of generated code using rigorous metrics and established benchmarks. Our comprehensive evaluation includes five state-of-the-art open-source models and three production-grade closed-source models, revealing varying degrees of robustness. Open-source models demonstrate an increased susceptibility to input perturbations, resulting in declines in functional correctness ranging from 12% to 34%. In contrast, commercial models demonstrate relatively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Software Testing and Debugging Techniques · Explainable Artificial Intelligence (XAI)