Evaluating Large Language Models for Line-Level Vulnerability Localization

Jian Zhang; Chong Wang; Anran Li; Weisong Sun; Cen Zhang; Wei Ma; Yang Liu

arXiv:2404.00287·cs.SE·December 29, 2025·3 cites

Evaluating Large Language Models for Line-Level Vulnerability Localization

Jian Zhang, Chong Wang, Anran Li, Weisong Sun, Cen Zhang, Wei Ma, Yang Liu

PDF

Open Access

TL;DR

This paper provides the first comprehensive empirical evaluation of large language models for automated vulnerability localization at the line level, exploring various models, paradigms, and datasets to understand their effectiveness and limitations.

Contribution

It systematically assesses 19 LLMs across multiple paradigms and datasets, introduces strategies to improve performance, and evaluates model generalizability in vulnerability localization.

Findings

01

Discriminative fine-tuning outperforms other methods with sufficient data.

02

ChatGPT and similar models excel in low-data scenarios.

03

Input length and context issues affect fine-tuning effectiveness.

Abstract

Recently, Automated Vulnerability Localization (AVL) has attracted growing attention, aiming to facilitate diagnosis by pinpointing the specific lines of code responsible for vulnerabilities. Large Language Models (LLMs) have shown potential in various domains, yet their effectiveness in line-level vulnerability localization remains underexplored. In this work, we present the first comprehensive empirical evaluation of LLMs for AVL. Our study examines 19 leading LLMs suitable for code analysis, including ChatGPT and multiple open-source models, spanning encoder-only, encoder-decoder, and decoder-only architectures, with model sizes from 60M to 70B parameters. We evaluate three paradigms including few-shot prompting, discriminative fine-tuning, and generative fine-tuning with and without Low-Rank Adaptation (LoRA), on both a BigVul-derived dataset for C/C++ and a smart contract…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Application Security Vulnerabilities · Technology and Data Analysis