Evaluating LLMs for One-Shot Patching of Real and Artificial Vulnerabilities
Aayush Garg, Zanis Ali Khan, Renzo Degiovanni, Qiang Tang

TL;DR
This paper empirically evaluates the effectiveness of various Large Language Models in one-shot patching of both real and artificial software vulnerabilities, highlighting their strengths, limitations, and variability.
Contribution
It provides a comprehensive empirical analysis of multiple LLMs' ability to patch vulnerabilities, including real and artificial cases, using concrete PoV testing.
Findings
LLMs patch real vulnerabilities more effectively than artificial ones
Significant variability exists among LLMs in patching capabilities
Overlap and complementarity among LLMs influence patching success
Abstract
Automated vulnerability patching is crucial for software security, and recent advancements in Large Language Models (LLMs) present promising capabilities for automating this task. However, existing research has primarily assessed LLMs using publicly disclosed vulnerabilities, leaving their effectiveness on related artificial vulnerabilities largely unexplored. In this study, we empirically evaluate the patching effectiveness and complementarity of several prominent LLMs, such as OpenAI's GPT variants, LLaMA, DeepSeek, and Mistral models, using both real and artificial vulnerabilities. Our evaluation employs Proof-of-Vulnerability (PoV) test execution to concretely assess whether LLM-generated source code successfully patches vulnerabilities. Our results reveal that LLMs patch real vulnerabilities more effectively compared to artificial ones. Additionally, our analysis reveals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Application Security Vulnerabilities · Software Engineering Research · Information and Cyber Security
