ChatGPT for Vulnerability Detection, Classification, and Repair: How Far Are We?
Michael Fu, Chakkrit Tantithamthavorn, Van Nguyen, Trung Le

TL;DR
This study evaluates ChatGPT's capabilities in software vulnerability tasks, revealing its limited performance compared to specialized models despite its large scale, and emphasizes the need for domain-specific fine-tuning.
Contribution
It provides a comprehensive empirical assessment of ChatGPT for vulnerability detection, classification, severity estimation, and repair, highlighting its current limitations and the importance of fine-tuning.
Findings
ChatGPT underperforms compared to specialized models in vulnerability tasks.
Fine-tuning is essential for ChatGPT to effectively handle vulnerability prediction.
Large-scale models alone are insufficient without domain-specific adaptation.
Abstract
Large language models (LLMs) like ChatGPT (i.e., gpt-3.5-turbo and gpt-4) exhibited remarkable advancement in a range of software engineering tasks associated with source code such as code review and code generation. In this paper, we undertake a comprehensive study by instructing ChatGPT for four prevalent vulnerability tasks: function and line-level vulnerability prediction, vulnerability classification, severity estimation, and vulnerability repair. We compare ChatGPT with state-of-the-art language models designed for software vulnerability purposes. Through an empirical assessment employing extensive real-world datasets featuring over 190,000 C/C++ functions, we found that ChatGPT achieves limited performance, trailing behind other language models in vulnerability contexts by a significant margin. The experimental outcomes highlight the challenging nature of vulnerability prediction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Web Application Security Vulnerabilities
