The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1

Kaiwen Zhou; Chengzhi Liu; Xuandong Zhao; Shreedhar Jangam; Jayanth Srinivasa; Gaowen Liu; Dawn Song; Xin Eric Wang

arXiv:2502.12659·cs.CY·November 18, 2025

The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1

Kaiwen Zhou, Chengzhi Liu, Xuandong Zhao, Shreedhar Jangam, Jayanth Srinivasa, Gaowen Liu, Dawn Song, Xin Eric Wang

PDF

Open Access

TL;DR

This paper evaluates the safety and robustness of large reasoning models like R1, revealing significant safety gaps, vulnerabilities to adversarial attacks, and safety concerns linked to their reasoning processes.

Contribution

It provides a comprehensive safety assessment of R1 models, highlighting safety gaps, vulnerabilities, and the safety implications of their reasoning capabilities.

Findings

01

Open-source LRMs have larger safety gaps than o3-mini.

02

Stronger reasoning models pose higher safety risks.

03

Safety thinking often fails under adversarial attacks.

Abstract

The rapid development of large reasoning models (LRMs), such as OpenAI-o3 and DeepSeek-R1, has led to significant improvements in complex reasoning over non-reasoning large language models~(LLMs). However, their enhanced capabilities, combined with the open-source access of models like DeepSeek-R1, raise serious safety concerns, particularly regarding their potential for misuse. In this work, we present a comprehensive safety assessment of these reasoning models, leveraging established safety benchmarks to evaluate their compliance with safety regulations. Furthermore, we investigate their susceptibility to adversarial attacks, such as jailbreaking and prompt injection, to assess their robustness in real-world applications. Through our multi-faceted analysis, we uncover four key findings: (1) There is a significant safety gap between the open-source reasoning models and the o3-mini…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Semantic Web and Ontologies

MethodsBalanced Selection