An Exploratory Study on Fine-Tuning Large Language Models for Secure Code Generation
Junjie Li, Fazle Rabbi, Cheng Cheng, Aseem Sangalay, Yuan Tian, Jinqiu Yang

TL;DR
This study investigates whether fine-tuning large language models on vulnerability-fixing code can improve the security of generated code, showing that parameter-efficient fine-tuning enhances security over prompt-based methods.
Contribution
It demonstrates that fine-tuning LLMs with vulnerability-fixing data, especially using PEFT techniques like LoRA, improves secure code generation in C/C++.
Findings
Fine-tuning improves security by up to 6.4% in C and 5.0% in C++.
LoRA fine-tuning outperforms prompt-based approaches.
Function and block-level datasets yield the best security improvements.
Abstract
AI-powered coding assistants such as GitHub's Copilot and OpenAI's ChatGPT have achieved notable success in automating code generation. However, these tools rely on pre-trained Large Language Models (LLMs) that are typically trained on human-written code sourced from open-source project hosting sites like GitHub, which often contains inherent security vulnerabilities. These vulnerabilities may then be mirrored in the code generated by these LLMs, a critical risk revealed and highlighted by recent empirical studies. In this work, we present an exploratory study on whether fine-tuning pre-trained LLMs on datasets of vulnerability-fixing commits can promote secure code generation. We explored full fine-tuning and two parameter-efficient fine-tuning techniques (LoRA and IA3) on four pre-trained LLMs for code generation. We crawled a fine-tuning dataset (14,622 C/C++ files) for secure code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques
