Enhancing Large Language Models for Secure Code Generation: A   Dataset-driven Study on Vulnerability Mitigation

Jiexin Wang; Liuwen Cao; Xitong Luo; Zhiping Zhou; Jiayuan Xie; Adam; Jatowt; Yi Cai

arXiv:2310.16263·cs.SE·October 26, 2023·2 cites

Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation

Jiexin Wang, Liuwen Cao, Xitong Luo, Zhiping Zhou, Jiayuan Xie, Adam, Jatowt, Yi Cai

PDF

Open Access

TL;DR

This study evaluates and improves large language models for secure code generation by introducing a vulnerability dataset, revealing current limitations, and proposing mitigation strategies to enhance security and robustness.

Contribution

The paper introduces SecuCoGen, a new vulnerability dataset, and provides comprehensive analysis and mitigation approaches for enhancing LLM security in code generation tasks.

Findings

01

Existing models often generate vulnerable code.

02

Models struggle to repair vulnerable code effectively.

03

Certain vulnerability types are particularly challenging for models.

Abstract

Large language models (LLMs) have brought significant advancements to code generation, benefiting both novice and experienced developers. However, their training using unsanitized data from open-source repositories, like GitHub, introduces the risk of inadvertently propagating security vulnerabilities. To effectively mitigate this concern, this paper presents a comprehensive study focused on evaluating and enhancing code LLMs from a software security perspective. We introduce SecuCoGen\footnote{SecuCoGen has been uploaded as supplemental material and will be made publicly available after publication.}, a meticulously curated dataset targeting 21 critical vulnerability types. SecuCoGen comprises 180 samples and serves as the foundation for conducting experiments on three crucial code-related tasks: code generation, code repair and vulnerability classification, with a strong emphasis on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Web Application Security Vulnerabilities