VulStyle: A Multi-Modal Pre-Training for Code Stylometry-Augmented Vulnerability Detection

Chidera Biringa; Ajmal Abbas; Vishnu Selvaraj; Gokhan Kul

arXiv:2604.26313·cs.CR·April 30, 2026

VulStyle: A Multi-Modal Pre-Training for Code Stylometry-Augmented Vulnerability Detection

Chidera Biringa, Ajmal Abbas, Vishnu Selvaraj, Gokhan Kul

PDF

TL;DR

VulStyle is a multi-modal pre-trained model that enhances vulnerability detection by jointly encoding source code, AST structure, and code stylometry features, achieving state-of-the-art results.

Contribution

It introduces a novel multi-modal approach combining non-terminal AST nodes and code stylometry, pre-trained on millions of functions across multiple languages.

Findings

01

VulStyle outperforms existing models on BigVul and VulDeePecker datasets.

02

Incorporating CStyle features significantly improves detection accuracy.

03

Selective AST node encoding reduces complexity while preserving semantic information.

Abstract

We present VulStyle, a multi-modal software vulnerability detection model that jointly encodes function-level source code, non-terminal Abstract Syntax Tree (AST) structure, and code stylometry (CStyle) features. Prior work in code representation primarily leverages token-level models or full AST trees, often missing stylistic cues indicative of risky programming practices, or incurring high structural overhead. Our approach selects only non-terminal AST nodes, reducing input complexity while preserving semantic hierarchy, and integrates syntactic and lexical CStyle features as auxiliary vulnerability signals. VulStyle is pre-trained using masked language modeling on 4.9M functions across seven programming languages, and fine-tuned across five benchmark datasets: Devign, BigVul, DiverseVul, REVEAL, and VulDeePecker. VulStyle achieves state-of-the-art performance on BigVul and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.