Instruction Tuning for Secure Code Generation
Jingxuan He, Mark Vero, Gabriela Krasnopolska, Martin Vechev

TL;DR
This paper introduces SafeCoder, a method for instruction tuning language models to generate more secure code without sacrificing utility, addressing a critical security gap in existing models.
Contribution
SafeCoder is a security-centric fine-tuning approach that combines security and utility optimization, significantly improving code safety in language models.
Findings
Security improved by about 30%
Effective across various LMs and datasets
Maintains utility while enhancing security
Abstract
Modern language models (LMs) have gained widespread acceptance in everyday and professional contexts, particularly in programming. An essential procedure enabling this adoption is instruction tuning, which substantially enhances LMs' practical utility by training them to follow user instructions and human preferences. However, existing instruction tuning schemes overlook a crucial aspect: the security of generated code. As a result, even the state-of-the-art instruction-tuned LMs frequently produce unsafe code, posing significant security risks. In this work, we introduce SafeCoder to address this gap. SafeCoder performs security-centric fine-tuning using a diverse and high-quality dataset that we collected using an automated pipeline. We integrate the security fine-tuning with standard instruction tuning, to facilitate a joint optimization of both security and utility. Despite its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Teaching and Learning Programming · Security and Verification in Computing
