Evaluating LLMs for One-Shot Patching of Real and Artificial Vulnerabilities

Aayush Garg; Zanis Ali Khan; Renzo Degiovanni; Qiang Tang

arXiv:2511.23408·cs.CR·December 1, 2025

Evaluating LLMs for One-Shot Patching of Real and Artificial Vulnerabilities

Aayush Garg, Zanis Ali Khan, Renzo Degiovanni, Qiang Tang

PDF

Open Access

TL;DR

This paper empirically evaluates the effectiveness of various Large Language Models in one-shot patching of both real and artificial software vulnerabilities, highlighting their strengths, limitations, and variability.

Contribution

It provides a comprehensive empirical analysis of multiple LLMs' ability to patch vulnerabilities, including real and artificial cases, using concrete PoV testing.

Findings

01

LLMs patch real vulnerabilities more effectively than artificial ones

02

Significant variability exists among LLMs in patching capabilities

03

Overlap and complementarity among LLMs influence patching success

Abstract

Automated vulnerability patching is crucial for software security, and recent advancements in Large Language Models (LLMs) present promising capabilities for automating this task. However, existing research has primarily assessed LLMs using publicly disclosed vulnerabilities, leaving their effectiveness on related artificial vulnerabilities largely unexplored. In this study, we empirically evaluate the patching effectiveness and complementarity of several prominent LLMs, such as OpenAI's GPT variants, LLaMA, DeepSeek, and Mistral models, using both real and artificial vulnerabilities. Our evaluation employs Proof-of-Vulnerability (PoV) test execution to concretely assess whether LLM-generated source code successfully patches vulnerabilities. Our results reveal that LLMs patch real vulnerabilities more effectively compared to artificial ones. Additionally, our analysis reveals…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Application Security Vulnerabilities · Software Engineering Research · Information and Cyber Security