KG-DF: A Black-box Defense Framework against Jailbreak Attacks Based on Knowledge Graphs

Shuyuan Liu; Jiawei Chen; Xiao Yang; Hang Su; Zhaoxia Yin

arXiv:2511.07480·cs.CR·November 12, 2025

KG-DF: A Black-box Defense Framework against Jailbreak Attacks Based on Knowledge Graphs

Shuyuan Liu, Jiawei Chen, Xiao Yang, Hang Su, Zhaoxia Yin

PDF

Open Access

TL;DR

KG-DF is a novel defense framework utilizing knowledge graphs to detect and mitigate jailbreak attacks on large language models, balancing security and usability through structured semantic analysis.

Contribution

Introduces a knowledge graph-based, black-box defense framework with an extensible semantic parsing module to improve security against jailbreak attacks.

Findings

01

Enhanced defense performance against various jailbreak methods.

02

Improved response quality in general QA scenarios.

03

Effective identification of harmful inputs using knowledge graph associations.

Abstract

With the widespread application of large language models (LLMs) in various fields, the security challenges they face have become increasingly prominent, especially the issue of jailbreak. These attacks induce the model to generate erroneous or uncontrolled outputs through crafted inputs, threatening the generality and security of the model. Although existing defense methods have shown some effectiveness, they often struggle to strike a balance between model generality and security. Excessive defense may limit the normal use of the model, while insufficient defense may lead to security vulnerabilities. In response to this problem, we propose a Knowledge Graph Defense Framework (KG-DF). Specifically, because of its structured knowledge representation and semantic association capabilities, Knowledge Graph(KG) can be searched by associating input content with safe knowledge in the knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Topic Modeling · Advanced Text Analysis Techniques