BadSKP: Backdoor Attacks on Knowledge Graph-Enhanced LLMs with Soft Prompts
Xiaoting Lyu, Yufei Han, Hangwei Qian, Haoyuan Yu, Xiang Ao, Bin Wang, Chenxu Wang, Xiaobo Ma, Wei Wang

TL;DR
This paper introduces BadSKP, a backdoor attack targeting the graph-to-prompt interface in KG-enhanced LLMs, exploiting the semantic anchoring effect to bypass defenses and compromise model outputs.
Contribution
It reveals a robustness gap in KG-enhanced LLMs and proposes a novel backdoor attack leveraging graph manipulation to induce adversarial soft prompts.
Findings
BadSKP achieves high attack success rates in experiments.
Text-only backdoor attacks are ineffective against KG-enhanced LLMs.
The attack remains effective under both frozen and trojaned model settings.
Abstract
Recent knowledge graph (KG)-enhanced large language models (LLMs) move beyond purely textual knowledge augmentation by encoding retrieved subgraphs into continuous soft prompts via graph neural networks, introducing a graph-conditioned channel that operates alongside the standard text interface. However, existing backdoor attacks are largely designed for the textual channel, and their effectiveness against this dual-channel architecture remains unclear. We show that this architecture creates a robustness gap: text-channel backdoor attacks that readily compromise textual KG prompting systems become largely ineffective against soft-prompt-based counterparts. We interpret this gap through semantic anchoring, whereby graph-derived soft prompts bias the generation-driving hidden state toward query-consistent semantics and suppress surface-level malicious instructions. Because this anchoring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
