Can An Old Fashioned Feature Extraction and A Light-weight Model Improve Vulnerability Type Identification Performance?
Hieu Dinh Vo, Son Nguyen

TL;DR
This paper demonstrates that a simple feature extraction method combined with a lightweight model can outperform complex pre-trained neural networks in vulnerability type identification, offering a more efficient solution.
Contribution
The study introduces a lightweight component that refines classical bag-of-words features, significantly improving vulnerability type identification performance over deep pre-trained models.
Findings
Baseline approach with feature refinement outperforms deep models
Lightweight method achieves high efficiency and accuracy
Component improves neural network results by up to 92.8% in F1
Abstract
Recent advances in automated vulnerability detection have achieved potential results in helping developers determine vulnerable components. However, after detecting vulnerabilities, investigating to fix vulnerable code is a non-trivial task. In fact, the types of vulnerability, such as buffer overflow or memory corruption, could help developers quickly understand the nature of the weaknesses and localize vulnerabilities for security analysis. In this work, we investigate the problem of vulnerability type identification (VTI). The problem is modeled as the multi-label classification task, which could be effectively addressed by "pre-training, then fine-tuning" framework with deep pre-trained embedding models. We evaluate the performance of the well-known and advanced pre-trained models for VTI on a large set of vulnerabilities. Surprisingly, their performance is not much better than that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Web Application Security Vulnerabilities
