Automated Software Vulnerability Static Code Analysis Using Generative   Pre-Trained Transformer Models

Elijah Pelofske; Vincent Urias; Lorie M. Liebrock

arXiv:2408.00197·cs.CR·August 2, 2024

Automated Software Vulnerability Static Code Analysis Using Generative Pre-Trained Transformer Models

Elijah Pelofske, Vincent Urias, Lorie M. Liebrock

PDF

Open Access

TL;DR

This study assesses the potential of open-source GPT models for automatically detecting vulnerabilities in C and C++ code, finding they are not yet suitable for full automation but can identify some vulnerabilities with high accuracy in specific cases.

Contribution

It provides a comprehensive evaluation of GPT models for vulnerability detection in source code, highlighting their current limitations and potential in targeted scenarios.

Findings

01

GPT models are not suitable for fully automated vulnerability scanning due to high error rates.

02

Some GPT models can identify vulnerable code lines with high precision and recall in specific cases.

03

Llama-2-70b-chat-hf achieved perfect precision and recall on a buffer overflow vulnerability example.

Abstract

Generative Pre-Trained Transformer models have been shown to be surprisingly effective at a variety of natural language processing tasks -- including generating computer code. We evaluate the effectiveness of open source GPT models for the task of automatic identification of the presence of vulnerable code syntax (specifically targeting C and C++ source code). This task is evaluated on a selection of 36 source code examples from the NIST SARD dataset, which are specifically curated to not contain natural English that indicates the presence, or lack thereof, of a particular vulnerability. The NIST SARD source code dataset contains identified vulnerable lines of source code that are examples of one out of the 839 distinct Common Weakness Enumerations (CWE), allowing for exact quantification of the GPT output classification error rate. A total of 5 GPT models are evaluated, using 10…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Reliability and Analysis Research · Software Engineering Research · Software Testing and Debugging Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Linear Layer · Attention Dropout · Residual Connection · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Multi-Head Attention · Attention Is All You Need · Weight Decay