SAGE: Signal-Amplified Guided Embeddings for LLM-based Vulnerability Detection

Zhengyang Shan; Xu Qian; Jiayun Xin; Minghui Xu; Yue Zhang; Zhen Yang; Hao Wu; Xiuzhen Cheng

arXiv:2604.19031·cs.CR·April 22, 2026

SAGE: Signal-Amplified Guided Embeddings for LLM-based Vulnerability Detection

Zhengyang Shan, Xu Qian, Jiayun Xin, Minghui Xu, Yue Zhang, Zhen Yang, Hao Wu, Xiuzhen Cheng

PDF

TL;DR

SAGE introduces a framework that amplifies vulnerability signals in LLMs, significantly improving detection accuracy and robustness across diverse datasets and languages by addressing the Signal Submersion problem.

Contribution

The paper proposes SAGE, a novel method using task-conditional autoencoders to recover and amplify faint vulnerability signals in LLM-based detection, surpassing existing approaches.

Findings

01

SAGE increases internal Signal-to-Noise Ratio by 12.7×.

02

Achieves up to 318% MCC improvement on unseen data.

03

Maintains performance across 13 programming languages.

Abstract

Software vulnerabilities are a primary threat to modern infrastructure. While static analysis and Graph Neural Networks have long served as the foundation for vulnerability detection, the emergence of Large Language Models (LLMs) has introduced a transformative paradigm driven by superior semantic reasoning and cross-environment generalization. However, in the context of LLM-based vulnerability detection, we identify a fundamental bottleneck in these models termed \textbf{Signal Submersion}: a state where features related to vulnerability are activated internally but numerically overwhelmed by dominant functional semantics. To address this, we propose \textbf{SAGE} (\textbf{S}ignal-\textbf{A}mplified \textbf{G}uided \textbf{E}mbeddings), a framework that shifts from passive signal submersion to active signal recovery. SAGE integrates task-conditional Sparse Autoencoders (SAEs) to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.