Security Vulnerability Detection with Multitask Self-Instructed Fine-Tuning of Large Language Models
Aidan Z.H. Yang, Haoye Tian, He Ye, Ruben Martins, Claire Le Goues

TL;DR
This paper introduces MSIVD, a multitask self-instructed fine-tuning approach combining LLMs and GNNs for improved software vulnerability detection, outperforming existing models on key datasets.
Contribution
The paper presents a novel multitask fine-tuning method integrating LLMs with graph neural networks for vulnerability detection, enhancing accuracy and generalization.
Findings
MSIVD achieves an F1 score of 0.92 on BigVul.
MSIVD outperforms the baseline LineVul.
Training LLMs with GNNs improves vulnerability detection performance.
Abstract
Software security vulnerabilities allow attackers to perform malicious activities to disrupt software operations. Recent Transformer-based language models have significantly advanced vulnerability detection, surpassing the capabilities of static analysis based deep learning models. However, language models trained solely on code tokens do not capture either the explanation of vulnerability type or the data flow structure information of code, both of which are crucial for vulnerability detection. We propose a novel technique that integrates a multitask sequence-to-sequence LLM with pro-gram control flow graphs encoded as a graph neural network to achieve sequence-to-classification vulnerability detection. We introduce MSIVD, multitask self-instructed fine-tuning for vulnerability detection, inspired by chain-of-thought prompting and LLM self-instruction. Our experiments demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Application Security Vulnerabilities · Data Quality and Management · Topic Modeling
MethodsGraph Neural Network
