100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models

Chong Zhang; Yue Deng; Xiang Lin; Bin Wang; Dianwen Ng; Hai Ye; Xingxuan Li; Yao Xiao; Zhanfeng Mo; Qi Zhang; Lidong Bing

arXiv:2505.00551·cs.CL·May 16, 2025

100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models

Chong Zhang, Yue Deng, Xiang Lin, Bin Wang, Dianwen Ng, Hai Ye, Xingxuan Li, Yao Xiao, Zhanfeng Mo, Qi Zhang, Lidong Bing

PDF

TL;DR

This survey reviews recent replication efforts of DeepSeek-R1, focusing on supervised fine-tuning and reinforcement learning from verifiable rewards, highlighting implementation details, experimental results, and future research directions in reasoning language models.

Contribution

It provides a comprehensive summary of recent replication studies on DeepSeek-R1, emphasizing data, methods, and training procedures to inspire future advancements in reasoning language models.

Findings

01

Replication studies achieved comparable performance using similar training procedures.

02

Insights into data construction and method design for RLMs.

03

Discussion of techniques to enhance and expand RLM applications.

Abstract

The recent development of reasoning language models (RLMs) represents a novel evolution in large language models. In particular, the recent release of DeepSeek-R1 has generated widespread social impact and sparked enthusiasm in the research community for exploring the explicit reasoning paradigm of language models. However, the implementation details of the released models have not been fully open-sourced by DeepSeek, including DeepSeek-R1-Zero, DeepSeek-R1, and the distilled small models. As a result, many replication studies have emerged aiming to reproduce the strong performance achieved by DeepSeek-R1, reaching comparable performance through similar training procedures and fully open-source data resources. These works have investigated feasible strategies for supervised fine-tuning (SFT) and reinforcement learning from verifiable rewards (RLVR), focusing on data preparation and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods[#Ask~For~Help!®️]™️ How do I contact blockchain-blockchain-contact-support · 15 Ways to Connect To Someone At Expedia by Phone: Step-by-Step Guide · Five Ways to Contact: How Can I Talk to Someone at Alaska Airlines® – A Step-by-Step Guide · Ways to Call How to Communicate to a Live Agent at Spirit Airlines: Call Now · 23 Ways To Speak To Live Agent At Southwest Airlines Via Phone · Shrink and Fine-Tune · Focus