Evaluation of ChatGPT Model for Vulnerability Detection

Anton Cheshkov; Pavel Zadorozhny; Rodion Levichev

arXiv:2304.07232·cs.CR·April 17, 2023·24 cites

Evaluation of ChatGPT Model for Vulnerability Detection

Anton Cheshkov, Pavel Zadorozhny, Rodion Levichev

PDF

Open Access

TL;DR

This paper assesses ChatGPT's effectiveness in detecting code vulnerabilities and finds it performs no better than a dummy classifier, highlighting limitations in applying large language models to security tasks.

Contribution

It provides an empirical evaluation of ChatGPT and GPT-3 for vulnerability detection, revealing their limitations in this specific security domain.

Findings

01

ChatGPT performs no better than a dummy classifier in vulnerability detection

02

Large language models may have limited utility for security-specific tasks

03

Evaluation on real-world datasets highlights current model shortcomings

Abstract

In this technical report, we evaluated the performance of the ChatGPT and GPT-3 models for the task of vulnerability detection in code. Our evaluation was conducted on our real-world dataset, using binary and multi-label classification tasks on CWE vulnerabilities. We decided to evaluate the model because it has shown good performance on other code-based tasks, such as solving programming challenges and understanding code at a high level. However, we found that the ChatGPT model performed no better than a dummy classifier for both binary and multi-label classification tasks for code vulnerability detection.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software Reliability and Analysis Research

MethodsMulti-Head Attention · Attention Is All You Need · Cosine Annealing · Linear Layer · Weight Decay · Linear Warmup With Cosine Annealing · Adam · Dense Connections · Attention Dropout · Dropout