From Theory to Practice: Code Generation Using LLMs for CAPEC and CWE Frameworks

Murtuza Shahzad; Joseph Wilson; Ibrahim Al Azher; Hamed Alhoori; Mona Rahimi

arXiv:2604.02548·cs.CR·April 6, 2026

From Theory to Practice: Code Generation Using LLMs for CAPEC and CWE Frameworks

Murtuza Shahzad, Joseph Wilson, Ibrahim Al Azher, Hamed Alhoori, Mona Rahimi

PDF

TL;DR

This paper introduces a new dataset of vulnerable code snippets linked to CAPEC and CWE descriptions, generated using GPT models, to improve security vulnerability research and detection.

Contribution

The authors develop a methodology employing GPT-4o, Llama, and Claude to generate a large, diverse dataset of vulnerable code snippets aligned with security frameworks.

Findings

01

High accuracy in generated code snippets with 0.98 cosine similarity

02

Dataset includes 615 code snippets across Java, Python, and JavaScript

03

Models show consistent results in vulnerability code generation

Abstract

The increasing complexity and volume of software systems have heightened the importance of identifying and mitigating security vulnerabilities. The existing software vulnerability datasets frequently fall short in providing comprehensive, detailed code snippets explicitly linked to specific vulnerability descriptions, reducing their utility for advanced research and hindering efforts to develop a deeper understanding of security vulnerabilities. To address this challenge, we present a novel dataset that provides examples of vulnerable code snippets corresponding to Common Attack Pattern Enumerations and Classifications (CAPEC) and Common Weakness Enumeration (CWE) descriptions. By employing the capabilities of Generative Pre-trained Transformer (GPT) models, we have developed a robust methodology for generating these examples. Our approach utilizes GPT-4o, Llama and Claude models to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.