A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning   and Inference-time Scaling Law

Qianjun Pan; Wenkai Ji; Yuyang Ding; Junsong Li; Shilian; Chen; Junyi Wang; Jie Zhou; Qin Chen; Min Zhang; Yulan Wu and; Liang He

arXiv:2505.02665·cs.AI·May 9, 2025

A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law

Qianjun Pan, Wenkai Ji, Yuyang Ding, Junsong Li, Shilian, Chen, Junyi Wang, Jie Zhou, Qin Chen, Min Zhang, Yulan Wu and, Liang He

PDF

Open Access

TL;DR

This survey reviews recent developments in reasoning large language models inspired by human slow thinking, emphasizing dynamic scaling, reinforcement learning, and structured problem-solving to enhance reasoning capabilities.

Contribution

It synthesizes over 100 studies to categorize methods like test-time scaling, reinforcement learning, and slow-thinking frameworks, outlining a comprehensive view of current advancements.

Findings

01

Dynamic test-time scaling improves reasoning efficiency.

02

Reinforcement learning refines decision-making in LLMs.

03

Structured slow-thinking methods enhance complex problem-solving.

Abstract

This survey explores recent advancements in reasoning large language models (LLMs) designed to mimic "slow thinking" - a reasoning process inspired by human cognition, as described in Kahneman's Thinking, Fast and Slow. These models, like OpenAI's o1, focus on scaling computational resources dynamically during complex tasks, such as math reasoning, visual reasoning, medical diagnosis, and multi-agent debates. We present the development of reasoning LLMs and list their key technologies. By synthesizing over 100 studies, it charts a path toward LLMs that combine human-like deep thinking with scalable efficiency for reasoning. The review breaks down methods into three categories: (1) test-time scaling dynamically adjusts computation based on task complexity via search and sampling, dynamic verification; (2) reinforced learning refines decision-making through iterative improvement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Constraint Satisfaction and Optimization

MethodsFocus