A Multi-Dataset Evaluation of Models for Automated Vulnerability Repair

Zanis Ali Khan; Aayush Garg; and Qiang Tang

arXiv:2506.04987·cs.SE·June 6, 2025

A Multi-Dataset Evaluation of Models for Automated Vulnerability Repair

Zanis Ali Khan, Aayush Garg, and Qiang Tang

PDF

Open Access

TL;DR

This paper evaluates pre-trained language models CodeBERT and CodeT5 for automated vulnerability patching across multiple datasets and languages, highlighting their strengths and limitations in generalization and scalability.

Contribution

It provides a comprehensive benchmark of these models for vulnerability repair, revealing their performance differences and challenges in generalizing to unseen vulnerabilities.

Findings

01

CodeT5 captures complex vulnerability patterns better.

02

CodeBERT performs better with fragmented or sparse context.

03

Models struggle to generalize to unseen vulnerabilities.

Abstract

Software vulnerabilities pose significant security threats, requiring effective mitigation. While Automated Program Repair (APR) has advanced in fixing general bugs, vulnerability patching, a security-critical aspect of APR remains underexplored. This study investigates pre-trained language models, CodeBERT and CodeT5, for automated vulnerability patching across six datasets and four languages. We evaluate their accuracy and generalization to unknown vulnerabilities. Results show that while both models face challenges with fragmented or sparse context, CodeBERT performs comparatively better in such scenarios, whereas CodeT5 excels in capturing complex vulnerability patterns. CodeT5 also demonstrates superior scalability. Furthermore, we test fine-tuned models on both in-distribution (trained) and out-of-distribution (unseen) datasets. While fine-tuning improves in-distribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Security and Verification in Computing

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Byte Pair Encoding · Linear Layer · SentencePiece · Attention Dropout · Softmax · Multi-Head Attention · Attention Is All You Need · Inverse Square Root Schedule · Adafactor