R1dacted: Investigating Local Censorship in DeepSeek's R1 Language Model
Ali Naseh, Harsh Chaudhari, Jaechul Roh, Mingshi Wu, Alina Oprea, Amir Houmansadr

TL;DR
This paper investigates the censorship behavior of the R1 large language model, revealing how it selectively refuses to answer politically sensitive prompts and analyzing the patterns, triggers, and transferability of this censorship across languages and models.
Contribution
The study introduces a curated set of prompts to analyze R1's censorship, examines its consistency and triggers, and proposes methods to bypass or remove such censorship.
Findings
R1 exhibits censorship on politically sensitive prompts not seen in other models.
Censorship patterns vary across topics, phrasing, and languages.
Censorship transferability to distilled models is demonstrated.
Abstract
DeepSeek recently released R1, a high-performing large language model (LLM) optimized for reasoning tasks. Despite its efficient training pipeline, R1 achieves competitive performance, even surpassing leading reasoning models like OpenAI's o1 on several benchmarks. However, emerging reports suggest that R1 refuses to answer certain prompts related to politically sensitive topics in China. While existing LLMs often implement safeguards to avoid generating harmful or offensive outputs, R1 represents a notable shift - exhibiting censorship-like behavior on politically charged queries. In this paper, we investigate this phenomenon by first introducing a large-scale set of heavily curated prompts that get censored by R1, covering a range of politically sensitive topics, but are not censored by other models. We then conduct a comprehensive analysis of R1's censorship patterns, examining their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Ethics and Social Impacts of AI · Big Data and Digital Economy
MethodsSparse Evolutionary Training
