Dynamic Guided and Domain Applicable Safeguards for Enhanced Security in   Large Language Models

Weidi Luo; He Cao; Zijing Liu; Yu Wang; Aidan Wong and; Bing Feng; Yuan Yao; Yu Li

arXiv:2410.17922·cs.AI·February 11, 2025

Dynamic Guided and Domain Applicable Safeguards for Enhanced Security in Large Language Models

Weidi Luo, He Cao, Zijing Liu, Yu Wang, Aidan Wong and, Bing Feng, Yuan Yao, Yu Li

PDF

Open Access 1 Repo

TL;DR

This paper presents G4D, a multi-agent framework that improves LLM safety by providing domain-aware, unbiased safety responses, effectively defending against jailbreak attacks while maintaining general utility.

Contribution

Introduction of G4D, a multi-agent safety framework that leverages external information for domain-specific and unbiased LLM safety responses, addressing current defense limitations.

Findings

01

G4D enhances robustness against jailbreak attacks in general and domain-specific scenarios.

02

G4D maintains LLM utility and responsiveness while improving safety.

03

Extensive experiments validate G4D's effectiveness across datasets.

Abstract

With the extensive deployment of Large Language Models (LLMs), ensuring their safety has become increasingly critical. However, existing defense methods often struggle with two key issues: (i) inadequate defense capabilities, particularly in domain-specific scenarios like chemistry, where a lack of specialized knowledge can lead to the generation of harmful responses to malicious queries. (ii) over-defensiveness, which compromises the general utility and responsiveness of LLMs. To mitigate these issues, we introduce a multi-agents-based defense framework, Guide for Defense (G4D), which leverages accurate external information to provide an unbiased summary of user intentions and analytically grounded safety response guidance. Extensive experiments on popular jailbreak attacks and benign datasets show that our G4D can enhance LLM's robustness against jailbreak attacks on general and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

idea-xl/g4d
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security