A Method for Enhancing the Safety of Large Model Generation Based on   Multi-dimensional Attack and Defense

Keke Zhai

arXiv:2501.00517·cs.CR·January 3, 2025

A Method for Enhancing the Safety of Large Model Generation Based on Multi-dimensional Attack and Defense

Keke Zhai

PDF

Open Access

TL;DR

This paper introduces a novel multi-dimensional attack and defense method to improve the safety and security of large model generation, especially against complex harmful instructions, while preserving model capabilities.

Contribution

It proposes a new approach that enhances safe alignment learning by increasing attack instruction diversity and response accuracy, validated through new benchmarks and experiments with Llama3.2.

Findings

01

Significantly improves generative security under complex attacks

02

Maintains and enhances the models' general capabilities

03

Outperforms existing security evaluation benchmarks

Abstract

Currently, large models are prone to generating harmful content when faced with complex attack instructions, significantly reducing their defensive capabilities. To address this issue, this paper proposes a method based on constructing data aligned with multi-dimensional attack defense to enhance the generative security of large models. The core of our method lies in improving the effectiveness of safe alignment learning for large models by innova-tively increasing the diversity of attack instruction dimensions and the accuracy of generat-ing safe responses. To validate the effectiveness of our method, beyond existing security evaluation benchmarks, we additionally designed new security evaluation benchmarks and conducted comparative experiments using Llama3.2 as the baseline model. The final ex-perimental results demonstrate that our method can significantly improve the generative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Decision-Making Techniques · Military Defense Systems Analysis · Simulation and Modeling Applications