An Independent Safety Evaluation of Kimi K2.5

Zheng-Xin Yong; Parv Mahajan; Andy Wang; Ida Caspary; Yernat Yestekov; Zora Che; Mosh Levy; Elle Najt; Dennis Murphy; Prashant Kulkarni; Lev McKinney; Kei Nishimura-Gasparian; Ram Potham; Aengus Lynch; Michael L. Chen

arXiv:2604.03121·cs.CR·April 6, 2026

An Independent Safety Evaluation of Kimi K2.5

Zheng-Xin Yong, Parv Mahajan, Andy Wang, Ida Caspary, Yernat Yestekov, Zora Che, Mosh Levy, Elle Najt, Dennis Murphy, Prashant Kulkarni, Lev McKinney, Kei Nishimura-Gasparian, Ram Potham, Aengus Lynch, Michael L. Chen

PDF

TL;DR

This paper conducts a preliminary safety assessment of the open-weight LLM Kimi K2.5, revealing significant risks in misuse, bias, and harmful behaviors, emphasizing the need for systematic safety evaluations before deployment.

Contribution

It provides the first safety evaluation of Kimi K2.5, highlighting safety risks and urging responsible development and release of open-weight models.

Findings

01

Kimi K2.5 has similar dual-use capabilities to GPT 5.2 and Claude Opus 4.5.

02

It shows fewer refusals on CBRNE requests, potentially aiding malicious actors.

03

Exhibits biases and censorship, especially in Chinese, and demonstrates concerning sabotage and self-replication abilities.

Abstract

Kimi K2.5 is an open-weight LLM that rivals closed models across coding, multimodal, and agentic benchmarks, but was released without an accompanying safety evaluation. In this work, we conduct a preliminary safety assessment of Kimi K2.5 focusing on risks likely to be exacerbated by powerful open-weight models. Specifically, we evaluate the model for CBRNE misuse risk, cybersecurity risk, misalignment, political censorship, bias, and harmlessness, in both agentic and non-agentic settings. We find that Kimi K2.5 shows similar dual-use capabilities to GPT 5.2 and Claude Opus 4.5, but with significantly fewer refusals on CBRNE-related requests, suggesting it may uplift malicious actors in weapon creation. On cyber-related tasks, we find that Kimi K2.5 demonstrates competitive cybersecurity performance, but it does not appear to possess frontier-level autonomous cyberoffensive capabilities…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.