Attention-Aware GNN-based Input Defense against Multi-Turn LLM Jailbreak
Zixuan Huang, Kecheng Huang, Lihao Yin, Bowei He, Huiling Zhen, Mingxuan Yuan, Zili Shao

TL;DR
This paper presents G-Guard, a novel attention-aware GNN-based classifier that effectively detects multi-turn jailbreak attacks on LLMs by modeling query relationships and incorporating relevant single-turn queries.
Contribution
Introduction of G-Guard, a GNN-based input classifier with attention-aware augmentation to defend against complex multi-turn jailbreak attacks on LLMs.
Findings
G-Guard outperforms baseline methods across multiple datasets.
The entity graph captures inter-query relationships effectively.
Attention-aware augmentation improves detection accuracy.
Abstract
Large Language Models (LLMs) have gained significant traction in various applications, yet their capabilities present risks for both constructive and malicious exploitation. Despite extensive training and fine-tuning efforts aimed at enhancing safety, LLMs remain susceptible to jailbreak attacks. Recently, the emergence of multi-turn attacks has intensified this vulnerability. Unlike single-turn attacks, multi-turn attacks incrementally escalate dialogue complexity, rendering them more challenging to detect and mitigate. In this study, we introduce G-Guard, an innovative attention-aware Graph Neural Network (GNN)-based input classifier specifically designed to defend against multi-turn jailbreak attacks targeting LLMs. G-Guard constructs an entity graph for multi-turn queries, which captures the interrelationships between queries and harmful keywords that present in multi-turn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Network Security and Intrusion Detection · Brain Tumor Detection and Classification
