Detect--Repair--Verify for LLM-Generated Code: A Multi-Language, Multi-Granularity Empirical Study
Cheng Cheng

TL;DR
This paper presents an empirical study of a Detect--Repair--Verify workflow for LLM-generated code, using a new multi-language benchmark to evaluate effectiveness, reliability, and trustworthiness of vulnerability mitigation processes.
Contribution
It introduces EduCollab, a comprehensive benchmark for LLM-generated web applications, and provides an empirical analysis of the DRV workflow's effectiveness and reliability across multiple languages and granularities.
Findings
Bounded iterative DRV improves secure-and-correct yield over single-pass repair.
Detection report usefulness varies and is often unreliable for downstream repair.
Repair trustworthiness depends heavily on repair scope and context.
Abstract
Large language models can generate runnable software artifacts, but their security remains difficult to evaluate end to end. This study examines that problem through a Detect--Repair--Verify (DRV) workflow, in which vulnerabilities are detected, repaired, and then rechecked with security and functional tests. It addresses four gaps in current evidence: the lack of test-grounded benchmarks for LLM-generated artifacts, limited evidence on pipeline-level effectiveness, unclear reliability of detection reports as repair guidance, and uncertain repair trustworthiness under verification. To support this study, EduCollab is constructed as a multi-language, multi-granularity benchmark of runnable LLM-generated web applications in PHP, JavaScript, and Python. Each artifact is paired with executable functional and exploit test suites, and the benchmark spans project-, requirement-, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Web Application Security Vulnerabilities
