Constrained Decoding for Secure Code Generation

Yanjun Fu; Ethan Baker; Yu Ding; Yizheng Chen

arXiv:2405.00218·cs.CR·July 23, 2024

Constrained Decoding for Secure Code Generation

Yanjun Fu, Ethan Baker, Yu Ding, Yizheng Chen

PDF

Open Access 2 Repos

TL;DR

This paper introduces constrained decoding techniques and new evaluation metrics to improve the security and correctness of code generated by large language models, addressing a critical gap in secure code generation.

Contribution

It proposes constrained decoding methods for secure code generation and introduces CodeGuard+ benchmark with new metrics to evaluate security and correctness.

Findings

01

Constrained decoding outperforms prefix tuning in security without sacrificing correctness.

02

Different decoding methods significantly impact the security of Code LLMs.

03

Constrained decoding surpasses GPT-4 in security performance.

Abstract

Code Large Language Models (Code LLMs) have been increasingly used by developers to boost productivity, but they often generate vulnerable code. Thus, there is an urgent need to ensure that code generated by Code LLMs is correct and secure. Previous research has primarily focused on generating secure code, overlooking the fact that secure code also needs to be correct. This oversight can lead to a false sense of security. Currently, the community lacks a method to measure actual progress in this area, and we need solutions that address both security and correctness of code generation. This paper introduces a new benchmark, CodeGuard+, along with two new metrics, to measure Code LLMs' ability to generate both secure and correct code. Using our new evaluation methods, we show that the state-of-the-art defense technique, prefix tuning, may not be as strong as previously believed, since…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCryptographic Implementations and Security · Coding theory and cryptography · Advanced Malware Detection Techniques

MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer