Rethinking the Capability of Fine-Tuned Language Models for Automated Vulnerability Repair

Woorim Han; Yeongjun Kwak; Miseon Yu; Kyeongmin Kim; Younghan Lee; Hyungon Moon; Yunheung Paek

arXiv:2512.22633·cs.SE·December 30, 2025

Rethinking the Capability of Fine-Tuned Language Models for Automated Vulnerability Repair

Woorim Han, Yeongjun Kwak, Miseon Yu, Kyeongmin Kim, Younghan Lee, Hyungon Moon, Yunheung Paek

PDF

Open Access

TL;DR

This paper critically evaluates fine-tuned language models for automated vulnerability repair, highlighting overfitting issues, limitations of current evaluation metrics, and proposing a new benchmark to better assess model robustness and generalization.

Contribution

It introduces a comprehensive evaluation framework including semantic transformations, re-splitting datasets, and a new benchmark, L-AVRBench, to improve assessment of AVR models.

Findings

01

Models often overfit to training data.

02

Match-based metrics can be misleading.

03

Proposed benchmark better captures true repair capabilities.

Abstract

Learning-based automated vulnerability repair (AVR) techniques that utilize fine-tuned language models have shown promise in generating vulnerability patches. However, questions remain about their ability to repair unseen vulnerabilities. Our empirical study reveals that state-of-the-art models often overfit to the training set and are evaluated using training, validation, and test sets that are not mutually exclusive. Furthermore, relying on match-based metrics that compare generated patches to reference fixes at the token level has some limitations, failing to account for the possibility of various valid ways to patch the vulnerability. In this paper, we examine the capabilities of state-of-the-art fine-tuned AVR models and the adequacy of match-based evaluation metrics in three ways. First, we apply semantic-preserving transformations to test sets in order to determine whether models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Application Security Vulnerabilities · Information and Cyber Security · Adversarial Robustness in Machine Learning