White-Basilisk: A Hybrid Model for Code Vulnerability Detection
Ioannis Lamprou, Alexander Shevtsov, Ioannis Arapakis, Sotiris Ioannidis

TL;DR
White-Basilisk is a compact, innovative AI model that outperforms larger models in detecting code vulnerabilities, capable of analyzing extensive codebases efficiently and effectively.
Contribution
The paper introduces White-Basilisk, a novel hybrid architecture achieving state-of-the-art vulnerability detection with only 200M parameters and processing long code sequences.
Findings
White-Basilisk surpasses existing models in vulnerability detection accuracy.
It handles longer code sequences than current LLMs.
The model maintains high performance on real-world, imbalanced datasets.
Abstract
The proliferation of software vulnerabilities presents a significant challenge to cybersecurity, necessitating more effective detection methodologies. We introduce White-Basilisk, a novel approach to vulnerability detection that demonstrates superior performance while challenging prevailing assumptions in AI model scaling. Utilizing an innovative architecture that integrates Mamba layers, linear self-attention, and a Mixture of Experts framework, White-Basilisk achieves state-of-the-art results in vulnerability detection tasks with a parameter count of only 200M. The model's capacity to process sequences of unprecedented length enables comprehensive analysis of extensive codebases in a single pass, surpassing the context limitations of current Large Language Models (LLMs). White-Basilisk exhibits robust performance on imbalanced, real-world datasets, while maintaining computational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
