Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks
Xiaodong Wu, Xiangman Li, and Jianbing Ni

TL;DR
This paper systematically evaluates the jailbreak vulnerabilities of DeepSeek and GPT models, revealing architecture-specific strengths and weaknesses, and emphasizing the importance of safety tuning for open-source LLMs.
Contribution
First comprehensive jailbreak assessment of DeepSeek models, comparing their robustness with GPT-3.5 and GPT-4 using the HarmBench benchmark.
Findings
DeepSeek's MoE architecture offers selective robustness against certain attacks.
GPT-4 Turbo shows more consistent safety alignment across behaviors.
Architectural trade-offs impact vulnerability and robustness to jailbreak attacks.
Abstract
The widespread deployment of large language models (LLMs) has raised critical concerns over their vulnerability to jailbreak attacks, i.e., adversarial prompts that bypass alignment mechanisms and elicit harmful or policy-violating outputs. While proprietary models like GPT-4 have undergone extensive evaluation, the robustness of emerging open-source alternatives such as DeepSeek remains largely underexplored, despite their growing adoption in real-world applications. In this paper, we present the first systematic jailbreak evaluation of DeepSeek-series models, comparing them with GPT-3.5 and GPT-4 using the HarmBench benchmark. We evaluate seven representative attack strategies across 510 harmful behaviors categorized by both function and semantic domain. Our analysis reveals that DeepSeek's Mixture-of-Experts (MoE) architecture introduces routing sparsity that offers selective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLayer Normalization · Dropout · Absolute Position Encodings · Dense Connections · Byte Pair Encoding · Softmax · GPT-4 · Label Smoothing · Transformer
