Can An Old Fashioned Feature Extraction and A Light-weight Model Improve   Vulnerability Type Identification Performance?

Hieu Dinh Vo; Son Nguyen

arXiv:2306.14726·cs.SE·June 27, 2023

Can An Old Fashioned Feature Extraction and A Light-weight Model Improve Vulnerability Type Identification Performance?

Hieu Dinh Vo, Son Nguyen

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that a simple feature extraction method combined with a lightweight model can outperform complex pre-trained neural networks in vulnerability type identification, offering a more efficient solution.

Contribution

The study introduces a lightweight component that refines classical bag-of-words features, significantly improving vulnerability type identification performance over deep pre-trained models.

Findings

01

Baseline approach with feature refinement outperforms deep models

02

Lightweight method achieves high efficiency and accuracy

03

Component improves neural network results by up to 92.8% in F1

Abstract

Recent advances in automated vulnerability detection have achieved potential results in helping developers determine vulnerable components. However, after detecting vulnerabilities, investigating to fix vulnerable code is a non-trivial task. In fact, the types of vulnerability, such as buffer overflow or memory corruption, could help developers quickly understand the nature of the weaknesses and localize vulnerabilities for security analysis. In this work, we investigate the problem of vulnerability type identification (VTI). The problem is modeled as the multi-label classification task, which could be effectively addressed by "pre-training, then fine-tuning" framework with deep pre-trained embedding models. We evaluate the performance of the well-known and advanced pre-trained models for VTI on a large set of vulnerabilities. Surprisingly, their performance is not much better than that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sonnguyenvnu/vit-project
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Web Application Security Vulnerabilities