T1: Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

Zhenyu Hou; Xin Lv; Rui Lu; Jiajie Zhang; Yujiang Li; Zijun Yao; Juanzi Li; Jie Tang; Yuxiao Dong

arXiv:2501.11651·cs.LG·June 16, 2025

T1: Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

Zhenyu Hou, Xin Lv, Rui Lu, Jiajie Zhang, Yujiang Li, Zijun Yao, Juanzi Li, Jie Tang, Yuxiao Dong

PDF

Open Access 1 Repo

TL;DR

This paper introduces T1, a method that enhances language model reasoning by combining reinforcement learning with inference scaling, leading to improved performance on complex math reasoning tasks through exploration and increased inference budgets.

Contribution

T1 is the first approach to effectively scale reinforcement learning for language models by promoting exploration and demonstrating inference scaling behavior.

Findings

01

T1 achieves superior performance on math reasoning benchmarks.

02

Increased inference budgets improve T1's performance without extra verification.

03

T1 exhibits inference scaling behavior with open LLMs.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks. However, existing approaches mainly rely on imitation learning and struggle to achieve effective test-time scaling. While reinforcement learning (RL) holds promise for enabling self-exploration, recent attempts yield modest improvements in complex reasoning. In this paper, we present T1 to scale RL by encouraging exploration and understand inference scaling. We first initialize the LLM using synthesized chain-of-thought data that integrates trial-and-error and self-verification. To scale RL training, we promote increased sampling diversity through oversampling. We demonstrate that T1 with open LLMs as its base exhibits inference scaling behavior and achieves superior performance on challenging math reasoning benchmarks. More importantly, we present a simple strategy to examine inference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thudm/t1
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling

MethodsBalanced Selection