Identifying Helpful Context for LLM-based Vulnerability Repair: A Preliminary Study

G\'abor Antal; Bence Bogenf\"urst; Rudolf Ferenc; P\'eter Heged\H{u}s

arXiv:2506.11561·cs.SE·June 16, 2025

Identifying Helpful Context for LLM-based Vulnerability Repair: A Preliminary Study

G\'abor Antal, Bence Bogenf\"urst, Rudolf Ferenc, P\'eter Heged\H{u}s

PDF

Open Access

TL;DR

This study evaluates GPT-4o's ability to repair Java vulnerabilities, showing that incorporating CVE information and code context enhances repair success, and ensemble prompts can outperform baseline methods in zero-shot vulnerability repair.

Contribution

It introduces a systematic comparison of GPT-4o with GPT-4 for vulnerability repair, highlighting the impact of contextual information and ensemble prompts on repair effectiveness.

Findings

01

GPT-4o repaired 10.5% more vulnerabilities than GPT-4 with the same prompts.

02

CVE information significantly improves repair success rates.

03

Ensemble prompt strategies outperform baseline approaches in zero-shot settings.

Abstract

Recent advancements in large language models (LLMs) have shown promise for automated vulnerability detection and repair in software systems. This paper investigates the performance of GPT-4o in repairing Java vulnerabilities from a widely used dataset (Vul4J), exploring how different contextual information affects automated vulnerability repair (AVR) capabilities. We compare the latest GPT-4o's performance against previous results with GPT-4 using identical prompts. We evaluated nine additional prompts crafted by us that contain various contextual information such as CWE or CVE information, and manually extracted code contexts. Each prompt was executed three times on 42 vulnerabilities, and the resulting fix candidates were validated using Vul4J's automated testing framework. Our results show that GPT-4o performed 11.9\% worse on average than GPT-4 with the same prompt, but was able…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Application Security Vulnerabilities · Access Control and Trust · Network Security and Intrusion Detection

MethodsDropout · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Layer Normalization · Dense Connections · Softmax · Transformer · GPT-4