A Tutorial on LLM Reasoning: Relevant Methods behind ChatGPT o1

Jun Wang

arXiv:2502.10867·cs.AI·February 18, 2025

A Tutorial on LLM Reasoning: Relevant Methods behind ChatGPT o1

Jun Wang

PDF

Open Access

TL;DR

This paper reviews methods behind ChatGPT's reasoning improvements, emphasizing reinforcement learning's role in training and decoding to enhance step-by-step reasoning capabilities.

Contribution

It provides a comprehensive formulation of reasoning problems and explores model-based and model-free approaches for slow-thinking frameworks.

Findings

01

Reinforcement learning significantly improves reasoning in language models.

02

Step-by-step reasoning training enhances model deliberation.

03

Both model-based and model-free methods support reasoning improvements.

Abstract

OpenAI o1 has shown that applying reinforcement learning to integrate reasoning steps directly during inference can significantly improve a model's reasoning capabilities. This result is exciting as the field transitions from the conventional autoregressive method of generating answers to a more deliberate approach that models the slow-thinking process through step-by-step reasoning training. Reinforcement learning plays a key role in both the model's training and decoding processes. In this article, we present a comprehensive formulation of reasoning problems and investigate the use of both model-based and model-free approaches to better support this slow-thinking framework.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Artificial Intelligence in Healthcare and Education