Evaluating Large Language Models for Line-Level Vulnerability Localization
Jian Zhang, Chong Wang, Anran Li, Weisong Sun, Cen Zhang, Wei Ma, Yang Liu

TL;DR
This paper provides the first comprehensive empirical evaluation of large language models for automated vulnerability localization at the line level, exploring various models, paradigms, and datasets to understand their effectiveness and limitations.
Contribution
It systematically assesses 19 LLMs across multiple paradigms and datasets, introduces strategies to improve performance, and evaluates model generalizability in vulnerability localization.
Findings
Discriminative fine-tuning outperforms other methods with sufficient data.
ChatGPT and similar models excel in low-data scenarios.
Input length and context issues affect fine-tuning effectiveness.
Abstract
Recently, Automated Vulnerability Localization (AVL) has attracted growing attention, aiming to facilitate diagnosis by pinpointing the specific lines of code responsible for vulnerabilities. Large Language Models (LLMs) have shown potential in various domains, yet their effectiveness in line-level vulnerability localization remains underexplored. In this work, we present the first comprehensive empirical evaluation of LLMs for AVL. Our study examines 19 leading LLMs suitable for code analysis, including ChatGPT and multiple open-source models, spanning encoder-only, encoder-decoder, and decoder-only architectures, with model sizes from 60M to 70B parameters. We evaluate three paradigms including few-shot prompting, discriminative fine-tuning, and generative fine-tuning with and without Low-Rank Adaptation (LoRA), on both a BigVul-derived dataset for C/C++ and a smart contract…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Application Security Vulnerabilities · Technology and Data Analysis
