Rethinking the Capability of Fine-Tuned Language Models for Automated Vulnerability Repair
Woorim Han, Yeongjun Kwak, Miseon Yu, Kyeongmin Kim, Younghan Lee, Hyungon Moon, Yunheung Paek

TL;DR
This paper critically evaluates fine-tuned language models for automated vulnerability repair, highlighting overfitting issues, limitations of current evaluation metrics, and proposing a new benchmark to better assess model robustness and generalization.
Contribution
It introduces a comprehensive evaluation framework including semantic transformations, re-splitting datasets, and a new benchmark, L-AVRBench, to improve assessment of AVR models.
Findings
Models often overfit to training data.
Match-based metrics can be misleading.
Proposed benchmark better captures true repair capabilities.
Abstract
Learning-based automated vulnerability repair (AVR) techniques that utilize fine-tuned language models have shown promise in generating vulnerability patches. However, questions remain about their ability to repair unseen vulnerabilities. Our empirical study reveals that state-of-the-art models often overfit to the training set and are evaluated using training, validation, and test sets that are not mutually exclusive. Furthermore, relying on match-based metrics that compare generated patches to reference fixes at the token level has some limitations, failing to account for the possibility of various valid ways to patch the vulnerability. In this paper, we examine the capabilities of state-of-the-art fine-tuned AVR models and the adequacy of match-based evaluation metrics in three ways. First, we apply semantic-preserving transformations to test sets in order to determine whether models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Application Security Vulnerabilities · Information and Cyber Security · Adversarial Robustness in Machine Learning
