Compass-Thinker-7B Technical Report

Anxiang Zeng; Haibo Zhang; Kaixiang Mo; Long Zhang; Shuman Liu; Yanhui Huang; Yawen Liu; Yuepeng Sheng; Yuwei Huang

arXiv:2508.08909·cs.AI·August 15, 2025

Compass-Thinker-7B Technical Report

Anxiang Zeng, Haibo Zhang, Kaixiang Mo, Long Zhang, Shuman Liu, Yanhui Huang, Yawen Liu, Yuepeng Sheng, Yuwei Huang

PDF

TL;DR

The paper introduces Compass-Thinker-7B, a resource-efficient reinforcement learning approach for large language models that enhances reasoning abilities, especially in mathematics, with promising results on challenging benchmarks.

Contribution

It presents a novel RL pipeline for training a 7B model with reduced resources, improving reasoning and mathematical problem-solving capabilities.

Findings

01

Achieves 40% accuracy on AIME2024

02

Outperforms similar-sized RL models in mathematics

03

Demonstrates efficient training with staged difficulty adjustments

Abstract

Recent R1-Zero-like research further demonstrates that reasoning extension has given large language models (LLMs) unprecedented reasoning capabilities, and Reinforcement Learning is the core technology to elicit its complex reasoning. However, conducting RL experiments directly on hyperscale models involves high computational costs and resource demands, posing significant risks. We propose the Compass-Thinker-7B model, which aims to explore the potential of Reinforcement Learning with less computational resources and costs, and provides insights for further research into RL recipes for larger models. Compass-Thinker-7B is trained from an open source model through a specially designed Reinforcement Learning Pipeline. We curate a dataset of 30k verifiable mathematics problems for the Reinforcement Learning Pipeline. By configuring data and training settings with different difficulty…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.