PCL-Reasoner-V1.5: Advancing Math Reasoning with Offline Reinforcement Learning

Yao Lu; Dengdong Fan; Jianzheng Nie; Fan Xu; Jie Chen; Bin Zhou; Yonghong Tian

arXiv:2601.14716·cs.LG·January 22, 2026

PCL-Reasoner-V1.5: Advancing Math Reasoning with Offline Reinforcement Learning

Yao Lu, Dengdong Fan, Jianzheng Nie, Fan Xu, Jie Chen, Bin Zhou, Yonghong Tian

PDF

Open Access 1 Models 1 Datasets

TL;DR

PCL-Reasoner-V1.5 is a large language model specialized in mathematical reasoning, utilizing a novel offline reinforcement learning approach to achieve state-of-the-art accuracy on math benchmarks.

Contribution

The paper introduces an offline RL method for training large language models, improving stability and efficiency over traditional online RL techniques.

Findings

01

Achieved 90.9% accuracy on AIME 2024

02

Attained 85.6% accuracy on AIME 2025

03

Demonstrated offline RL as a stable training paradigm for reasoning models

Abstract

We present PCL-Reasoner-V1.5, a 32-billion-parameter large language model (LLM) for mathematical reasoning. The model is built upon Qwen2.5-32B and refined via supervised fine-tuning (SFT) followed by reinforcement learning (RL). A central innovation is our proposed offline RL method, which provides superior training stability and efficiency over standard online RL methods such as GRPO. Our model achieves state-of-the-art performance among models post-trained on Qwen2.5-32B, attaining average accuracies of 90.9% on AIME 2024 and 85.6% on AIME 2025. Our work demonstrates offline RL as a stable and efficient paradigm for advancing reasoning in LLMs. All experiments were conducted on Huawei Ascend 910C NPUs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
PCL-Reasoner/V1.5
model· 3 dl· ♡ 1
3 dl♡ 1

Datasets

PCL-Reasoner/V1.5-RL-Math
dataset· 8 dl
8 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques