SecureCode: A Production-Grade Multi-Turn Dataset for Training Security-Aware Code Generation Models
Scott Thornton

TL;DR
SecureCode is a comprehensive, multi-turn dataset designed for training security-aware code generation models, covering web and AI/ML security with high-quality, instruction-tuning-ready examples grounded in real-world security incidents.
Contribution
It introduces the first public dataset combining OWASP Top 10 2021 web security and OWASP LLM Top 10 2025 AI/ML security examples in a conversational format for instruction tuning.
Findings
High-quality dataset with 2,185 multi-turn examples
Includes domain-specific security strategies and testing approaches
Released with fine-tuned models and evaluation framework
Abstract
AI coding assistants produce vulnerable code in 45\% of security-relevant scenarios~\cite{veracode2025}, yet no public training dataset teaches both traditional web security and AI/ML-specific defenses in a format suitable for instruction tuning. We present SecureCode, a production-grade dataset of 2,185 multi-turn security training examples spanning two domains: web application security (1,435 examples covering the OWASP Top 10 2021 across 11 languages and 9 frameworks, 100\% grounded in documented CVEs and security incidents) and AI/ML security (750 examples covering all 10 OWASP LLM Top 10 2025 categories across more than 40 frameworks, including LangChain, OpenAI, and Hugging Face). Every example follows a 4-turn conversational structure -- feature request; vulnerable and secure implementations with attack demonstrations; advanced probing; and defense-in-depth operational guidance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗scthornton/llama-3.2-3b-securecodemodel· 21 dl21 dl
- 🤗scthornton/deepseek-coder-6.7b-securecodemodel· 20 dl· ♡ 120 dl♡ 1
- 🤗scthornton/codellama-13b-securecodemodel· 2 dl2 dl
- 🤗scthornton/starcoder2-15b-securecodemodel· 12 dl12 dl
- 🤗scthornton/qwen-coder-7b-securecodemodel· ♡ 1♡ 1
- 🤗scthornton/qwen2.5-coder-7b-securecodemodel· 3 dl3 dl
- 🤗scthornton/qwen2.5-coder-14b-securecodemodel· 2 dl2 dl
- 🤗scthornton/codegemma-7b-securecodemodel· 16 dl16 dl
- 🤗scthornton/granite-20b-code-securecodemodel· 15 dl· ♡ 115 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Web Application Security Vulnerabilities
