BadSKP: Backdoor Attacks on Knowledge Graph-Enhanced LLMs with Soft Prompts

Xiaoting Lyu; Yufei Han; Hangwei Qian; Haoyuan Yu; Xiang Ao; Bin Wang; Chenxu Wang; Xiaobo Ma; Wei Wang

arXiv:2605.11996·cs.AI·May 13, 2026

BadSKP: Backdoor Attacks on Knowledge Graph-Enhanced LLMs with Soft Prompts

Xiaoting Lyu, Yufei Han, Hangwei Qian, Haoyuan Yu, Xiang Ao, Bin Wang, Chenxu Wang, Xiaobo Ma, Wei Wang

PDF

TL;DR

This paper introduces BadSKP, a backdoor attack targeting the graph-to-prompt interface in KG-enhanced LLMs, exploiting the semantic anchoring effect to bypass defenses and compromise model outputs.

Contribution

It reveals a robustness gap in KG-enhanced LLMs and proposes a novel backdoor attack leveraging graph manipulation to induce adversarial soft prompts.

Findings

01

BadSKP achieves high attack success rates in experiments.

02

Text-only backdoor attacks are ineffective against KG-enhanced LLMs.

03

The attack remains effective under both frozen and trojaned model settings.

Abstract

Recent knowledge graph (KG)-enhanced large language models (LLMs) move beyond purely textual knowledge augmentation by encoding retrieved subgraphs into continuous soft prompts via graph neural networks, introducing a graph-conditioned channel that operates alongside the standard text interface. However, existing backdoor attacks are largely designed for the textual channel, and their effectiveness against this dual-channel architecture remains unclear. We show that this architecture creates a robustness gap: text-channel backdoor attacks that readily compromise textual KG prompting systems become largely ineffective against soft-prompt-based counterparts. We interpret this gap through semantic anchoring, whereby graph-derived soft prompts bias the generation-driving hidden state toward query-consistent semantics and suppress surface-level malicious instructions. Because this anchoring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.