Learning to Focus: Context Extraction for Efficient Code Vulnerability Detection with Language Models

Xinran Zheng; Xingzhi Qian; Huichi Zhou; Shuo Yang; Yiling He; Suman Jana; Lorenzo Cavallaro

arXiv:2505.17460·cs.SE·July 16, 2025

Learning to Focus: Context Extraction for Efficient Code Vulnerability Detection with Language Models

Xinran Zheng, Xingzhi Qian, Huichi Zhou, Shuo Yang, Yiling He, Suman Jana, Lorenzo Cavallaro

PDF

TL;DR

FocusVul enhances language models for code vulnerability detection by learning to select and extract semantically rich context, significantly improving accuracy and efficiency on real-world benchmarks.

Contribution

Proposes a novel framework that learns to identify vulnerability-relevant regions and extract context, improving LM-based detection without relying on commit annotations during inference.

Findings

01

164.04% improvement in classification performance

02

19.12% reduction in FLOPs

03

Outperforms heuristic and full-function fine-tuning methods

Abstract

Language models (LMs) show promise for vulnerability detection but struggle with long, real-world code due to sparse and uncertain vulnerability locations. These issues, exacerbated by token limits, often cause models to miss vulnerability-related signals, thereby impairing effective learning. A key intuition is to enhance LMs with concise, information-rich context. Commit-based annotations offer precise, CWE-agnostic supervision, but are unavailable during inference, as they depend on historical code changes. Moreover, their extreme sparsity, often covering only a few lines, makes it difficult for LMs to process directly. In this paper, we propose FocusVul, a model-agnostic framework that improves LM-based vulnerability detection by learning to select sensitive context. FocusVul learns commit-based annotation patterns through hierarchical semantic modeling and generalizes them to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.