Leveraging GPT-4 for Vulnerability-Witnessing Unit Test Generation

G\'abor Antal; D\'enes B\'an; Martin Isztin; Rudolf Ferenc; and P\'eter Heged\H{u}s

arXiv:2506.11559·cs.SE·June 16, 2025

Leveraging GPT-4 for Vulnerability-Witnessing Unit Test Generation

G\'abor Antal, D\'enes B\'an, Martin Isztin, Rudolf Ferenc, and P\'eter Heged\H{u}s

PDF

Open Access

TL;DR

This paper investigates GPT-4's ability to automatically generate unit tests that witness software vulnerabilities, demonstrating promising syntactic correctness and potential for aiding security testing despite some limitations.

Contribution

It presents an empirical study on GPT-4's effectiveness in generating vulnerability-witnessing tests from real vulnerability datasets, highlighting its potential in automated security testing.

Findings

01

GPT-4 generates syntactically correct tests 66.5% of the time

02

Semantic correctness validation is possible in 7.5% of cases

03

Generated tests can be manually refined into effective vulnerability witnesses

Abstract

In the life-cycle of software development, testing plays a crucial role in quality assurance. Proper testing not only increases code coverage and prevents regressions but it can also ensure that any potential vulnerabilities in the software are identified and effectively fixed. However, creating such tests is a complex, resource-consuming manual process. To help developers and security experts, this paper explores the automatic unit test generation capability of one of the most widely used large language models, GPT-4, from the perspective of vulnerabilities. We examine a subset of the VUL4J dataset containing real vulnerabilities and their corresponding fixes to determine whether GPT-4 can generate syntactically and/or semantically correct unit tests based on the code before and after the fixes as evidence of vulnerability mitigation. We focus on the impact of code contexts, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Security and Verification in Computing · Adversarial Robustness in Machine Learning

MethodsDropout · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Layer Normalization · Dense Connections · Softmax · Transformer · GPT-4 · Focus