Automated Software Vulnerability Static Code Analysis Using Generative Pre-Trained Transformer Models
Elijah Pelofske, Vincent Urias, Lorie M. Liebrock

TL;DR
This study assesses the potential of open-source GPT models for automatically detecting vulnerabilities in C and C++ code, finding they are not yet suitable for full automation but can identify some vulnerabilities with high accuracy in specific cases.
Contribution
It provides a comprehensive evaluation of GPT models for vulnerability detection in source code, highlighting their current limitations and potential in targeted scenarios.
Findings
GPT models are not suitable for fully automated vulnerability scanning due to high error rates.
Some GPT models can identify vulnerable code lines with high precision and recall in specific cases.
Llama-2-70b-chat-hf achieved perfect precision and recall on a buffer overflow vulnerability example.
Abstract
Generative Pre-Trained Transformer models have been shown to be surprisingly effective at a variety of natural language processing tasks -- including generating computer code. We evaluate the effectiveness of open source GPT models for the task of automatic identification of the presence of vulnerable code syntax (specifically targeting C and C++ source code). This task is evaluated on a selection of 36 source code examples from the NIST SARD dataset, which are specifically curated to not contain natural English that indicates the presence, or lack thereof, of a particular vulnerability. The NIST SARD source code dataset contains identified vulnerable lines of source code that are examples of one out of the 839 distinct Common Weakness Enumerations (CWE), allowing for exact quantification of the GPT output classification error rate. A total of 5 GPT models are evaluated, using 10…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Reliability and Analysis Research · Software Engineering Research · Software Testing and Debugging Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Linear Layer · Attention Dropout · Residual Connection · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Multi-Head Attention · Attention Is All You Need · Weight Decay
