A Mixture of Linear Corrections Generates Secure Code

Weichen Yu; Ravi Mangal; Terry Zhuo; Matt Fredrikson; Corina S. Pasareanu

arXiv:2507.09508·cs.CR·July 15, 2025

A Mixture of Linear Corrections Generates Secure Code

Weichen Yu, Ravi Mangal, Terry Zhuo, Matt Fredrikson, Corina S. Pasareanu

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a mixture of corrections (MoC) technique that leverages internal vulnerability representations in LLMs to generate more secure code, significantly reducing vulnerabilities without sacrificing functionality.

Contribution

It reveals that LLMs encode vulnerability-related concepts and proposes a novel inference-time steering method to improve code security during generation.

Findings

01

MoC improves security ratio of Qwen2.5-Coder-7B by 8.9%.

02

MoC enhances HumanEval pass@1 by 2.1%.

03

LLMs encode precise internal vulnerability representations.

Abstract

Large language models (LLMs) have become proficient at sophisticated code-generation tasks, yet remain ineffective at reliably detecting or avoiding code vulnerabilities. Does this deficiency stem from insufficient learning about code vulnerabilities, or is it merely a result of ineffective prompting? Using representation engineering techniques, we investigate whether LLMs internally encode the concepts necessary to identify code vulnerabilities. We find that current LLMs encode precise internal representations that distinguish vulnerable from secure code--achieving greater accuracy than standard prompting approaches. Leveraging these vulnerability-sensitive representations, we develop an inference-time steering technique that subtly modulates the model's token-generation probabilities through a mixture of corrections (MoC). Our method effectively guides LLMs to produce less vulnerable…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 4

Strengths

- MoC is an inference-time steering technique that effectively guides LLMs to produce less vulnerable code. Notably, it enhanced the security ratio while simultaneously improving functionality on HumanEval. - The method is a practical approach to controlled vulnerability management that does not require costly retraining or extensive prompt engineering. - The guiding correction vectors sometimes transfer across models, yielding a computationally efficient way to harden models that are not specif

Weaknesses

- The primary evaluation tool, CodeQL, exhibits inherent limitations in both accuracy and computational efficiency. The paper notes a scarcity of robust automated evaluation methods for code generation, and finding that using an LLM-as-a-judge is unsuitable due to poor performance in code vulnerability detection - The paper requires fully open-source access to the model's internal representations and parameters. This dependency on white-box access limits the practical applicability of MoC to pro

Reviewer 02Rating 4Confidence 4

Strengths

- The paper studies the important problems of vulnerability detection and secure code generation with LLMs. - The paper is well-written and the key ideas are easy to understand. - The use of linear probing to detect vulnerabilities is novel.

Weaknesses

- Some of the steering methods considered in Section 3.2.1 have been proposed for other natural language tasks (difference of group mean [1], normal vector of the decision boundary [2]). - The secure code generation task reports the security ratio metric but does not report the correctness of the outputs after steering with MoC on the SVEN Test Set (Table 6). There could potentially be a trade-off between the security ratio and the correctness of generation after steering (similar to the accura

Reviewer 03Rating 4Confidence 3

Strengths

- Employs linear probing on hidden representations, achieving better bug detection accuracy than prompt-based baselines. - Introduces a Mixture of Corrections that improve code security while maintaining functionality. - Demonstrates that the learned correction vectors exhibit a certain degree of cross-model transferability.

Weaknesses

- Although four types of corrections are proposed, the paper does not clearly describe how they are combined for a given bug type. Are all four used simultaneously, or is only one applied each time? - Each correction is trained specifically for one bug type (CWE). This means for multiple bug types, separate probes and corrections must be trained, potentially increasing computational overhead and raising questions about interaction or interference between corrections when multiple vulnerabilities

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAntenna Design and Analysis