On the Evaluation of Large Language Models in Multilingual Vulnerability Repair

Dong wang; Junji Yu; Honglin Shu; Michael Fu; Chakkrit Tantithamthavorn; Yasutaka Kamei; Junjie Chen

arXiv:2508.03470·cs.SE·August 6, 2025

On the Evaluation of Large Language Models in Multilingual Vulnerability Repair

Dong wang, Junji Yu, Honglin Shu, Michael Fu, Chakkrit Tantithamthavorn, Yasutaka Kamei, Junjie Chen

PDF

Open Access

TL;DR

This paper presents a large-scale empirical study evaluating the effectiveness of large language models, especially GPT-4o, in repairing software vulnerabilities across seven programming languages, highlighting their potential and limitations.

Contribution

It is the first comprehensive study comparing LLMs and existing approaches for multilingual vulnerability repair, demonstrating GPT-4o's competitive performance and generalization capabilities.

Findings

01

GPT-4o performs competitively with VulMaster.

02

LLMs are more effective in repairing unique and dangerous vulnerabilities.

03

Go language shows the highest repair effectiveness.

Abstract

Various Deep Learning-based approaches with pre-trained language models have been proposed for automatically repairing software vulnerabilities. However, these approaches are limited to a specific programming language (C/C++). Recent advances in large language models (LLMs) offer language-agnostic capabilities and strong semantic understanding, exhibiting potential to overcome multilingual vulnerability limitations. Although some work has begun to explore LLMs' repair performance, their effectiveness is unsatisfactory. To address these limitations, we conducted a large-scale empirical study to investigate the performance of automated vulnerability repair approaches and state-of-the-art LLMs across seven programming languages. Results show GPT-4o, instruction-tuned with few-shot prompting, performs competitively against the leading approach, VulMaster. Additionally, the LLM-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Web Application Security Vulnerabilities