TL;DR
DeepGuard enhances code generation security by aggregating multi-layer signals from large language models, improving vulnerability detection without sacrificing correctness.
Contribution
It introduces a multi-layer aggregation framework with attention-based modules to better detect and mitigate vulnerabilities during code generation.
Findings
DeepGuard improves secure-and-correct generation rate by 11.9% on average.
It preserves functional correctness while generalizing to new vulnerability types.
The method leverages distributed cues from multiple layers, outperforming single-layer baselines.
Abstract
Large Language Models (LLMs) for code generation can replicate insecure patterns from their training data. To mitigate this, a common strategy for security hardening is to fine-tune models using supervision derived from the final transformer layer. However, this design may suffer from a final-layer bottleneck: vulnerability-discriminative cues can be distributed across layers and become less detectable near the output representations optimized for next-token prediction. To diagnose this issue, we perform layer-wise linear probing. We observe that vulnerability-related signals are most detectable in a band of intermediate-to-upper layers yet attenuate toward the final layers. Motivated by this observation, we introduce DeepGuard, a framework that leverages distributed security-relevant cues by aggregating representations from multiple upper layers via an attention-based module. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
