Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks

Xiaodong Wu; Xiangman Li; and Jianbing Ni

arXiv:2506.18543·cs.CR·June 24, 2025

Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks

Xiaodong Wu, Xiangman Li, and Jianbing Ni

PDF

TL;DR

This paper systematically evaluates the jailbreak vulnerabilities of DeepSeek and GPT models, revealing architecture-specific strengths and weaknesses, and emphasizing the importance of safety tuning for open-source LLMs.

Contribution

First comprehensive jailbreak assessment of DeepSeek models, comparing their robustness with GPT-3.5 and GPT-4 using the HarmBench benchmark.

Findings

01

DeepSeek's MoE architecture offers selective robustness against certain attacks.

02

GPT-4 Turbo shows more consistent safety alignment across behaviors.

03

Architectural trade-offs impact vulnerability and robustness to jailbreak attacks.

Abstract

The widespread deployment of large language models (LLMs) has raised critical concerns over their vulnerability to jailbreak attacks, i.e., adversarial prompts that bypass alignment mechanisms and elicit harmful or policy-violating outputs. While proprietary models like GPT-4 have undergone extensive evaluation, the robustness of emerging open-source alternatives such as DeepSeek remains largely underexplored, despite their growing adoption in real-world applications. In this paper, we present the first systematic jailbreak evaluation of DeepSeek-series models, comparing them with GPT-3.5 and GPT-4 using the HarmBench benchmark. We evaluate seven representative attack strategies across 510 harmful behaviors categorized by both function and semantic domain. Our analysis reveals that DeepSeek's Mixture-of-Experts (MoE) architecture introduces routing sparsity that offers selective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLayer Normalization · Dropout · Absolute Position Encodings · Dense Connections · Byte Pair Encoding · Softmax · GPT-4 · Label Smoothing · Transformer