Lightweight Vulnerability Detection from Code Metrics and Token Features

Chun Yin Chiu

arXiv:2605.04260·cs.CR·May 7, 2026

Lightweight Vulnerability Detection from Code Metrics and Token Features

Chun Yin Chiu

PDF

TL;DR

This paper presents a lightweight, interpretable vulnerability detection method for C/C++ code using token n-grams and basic code metrics, avoiding complex deep learning models.

Contribution

It introduces a simple, fast vulnerability triage pipeline combining TF-IDF token features with code metrics, evaluated on Devign labels across various settings.

Findings

01

Best variant achieves PR-AUC 0.642 and Recall@10% 0.161 on random split

02

Cross-project generalization is more challenging, with PR-AUC around 0.436

03

Simple token and metric features serve as a transparent baseline but are sensitive to superficial cues.

Abstract

Vulnerability detection for C/C++ code increasingly relies on heavy representations such as code graphs and deep models, while many practical workflows still benefit from fast and reproducible ranking baselines for human triage. This preprint studies a lightweight function-level vulnerability triage pipeline that combines sparse token n-grams from raw function text with a small set of inexpensive code metrics, including NLOC, approximate cyclomatic complexity, token count, maximum brace depth, and parameter count. We use TF-IDF token features and a class-weighted logistic regression classifier, avoiding deep learning, transformers, and program graphs. Using the Devign function-level labels, we evaluate random and cross-project settings, including a FFmpeg-to-QEMU transfer experiment. We emphasize precision-recall AUC and Recall@10% as ranking-oriented metrics for skewed or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.