Code-Centric Detection of Vulnerability-Fixing Commits: A Unified Benchmark and Empirical Study

Nils Loose; Joseph Bienh\"uls; Kristoffer Hempel; Felix M\"achtle; Thomas Eisenbarth

arXiv:2605.13138·cs.SE·May 14, 2026

Code-Centric Detection of Vulnerability-Fixing Commits: A Unified Benchmark and Empirical Study

Nils Loose, Joseph Bienh\"uls, Kristoffer Hempel, Felix M\"achtle, Thomas Eisenbarth

PDF

TL;DR

This study evaluates code language models for detecting vulnerability-fixing commits, revealing limited transferability and highlighting the importance of commit messages, with a comprehensive benchmark and analysis of current limitations.

Contribution

It provides a unified framework and evaluation suite for code-centric vulnerability-fixing commit detection, analyzing over 180 experiments and exposing key limitations of current models.

Findings

01

Models do not acquire transferable security understanding from code changes alone.

02

Commit messages dominate model attention when available.

03

Models miss over 93% of vulnerabilities at a 0.5% false positive rate.

Abstract

Automated detection of vulnerability-fixing commits (VFCs) is critical for timely security patch deployment, as advisory databases lag patch releases by a median of 25 days and many fixes never receive advisories. We present a comprehensive evaluation of code language model based VFC detection through a unified framework consolidating over 20 fragmented datasets spanning more than 180000 commits. Across over 180 experiments with fine-tuned models from 125 M to 14 B parameters, we find no evidence that models acquire transferable security-relevant code understanding from code changes alone. When commit messages are available, they dominate model attention, and when removed, an attribution analysis shows that enriching diffs with additional intra-procedural semantic context does not shift model attention toward the code changes. Group-stratified evaluation exposes approximately 17%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.