When Labels Are Scarce: A Systematic Mapping of Label-Efficient Code Vulnerability Detection
Noor Khalal, Chakib Fettal, Lazhar Labiod, Mohamed Nadif

TL;DR
This paper systematically reviews label-efficient methods for code vulnerability detection, addressing challenges of unreliable and scarce vulnerability labels across diverse projects and languages.
Contribution
It synthesizes five main approaches to label-efficient CVD, connecting mechanisms to various representations and providing a practical decision guide.
Findings
Maps five paradigm families of label-efficient CVD approaches.
Connects mechanisms to token, graph, hybrid, and knowledge-based representations.
Provides a decision guide for method selection based on trade-offs and failure modes.
Abstract
Machine-learning-based code vulnerability detection (CVD) has progressed rapidly, from deep program representations to pretrained code models and LLM-centered pipelines. Yet dependable vulnerability labeling remains expensive, noisy, and uneven across projects, languages, and CWE types, motivating approaches that reduce reliance on human labeling. This survey maps these approaches, synthesizing five paradigm families and the mechanisms they use. It connects mechanisms to token, graph, hybrid, and knowledgebased representations, and consolidates evaluation and reporting axes that limit comparison (label-budget specification, compute/cost assumptions, leakage, and granularity mismatches). A Design Map and constraintfirst Decision Guide distill trade-offs and failure modes for practical method selection.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
