Enhancing Math Reasoning in Small-sized LLMs via Preview Difficulty-Aware Intervention

Xinhan Di; JoyJiaoW

arXiv:2508.01604·cs.LG·August 5, 2025

Enhancing Math Reasoning in Small-sized LLMs via Preview Difficulty-Aware Intervention

Xinhan Di, JoyJiaoW

PDF

Open Access

TL;DR

This paper introduces a difficulty-aware intervention method for small-sized language models to improve math reasoning, demonstrating significant performance gains on various math benchmarks.

Contribution

It presents a novel difficulty-aware intervention technique integrated into an open-source reinforcement learning framework for small LLMs, enhancing their math reasoning abilities.

Findings

01

Achieved 50.0% on AIME24

02

Reached 89.2% on Math500

03

Improved performance on multiple math benchmarks

Abstract

Reinforcement learning scaling enhances the reasoning capabilities of large language models, with reinforcement learning serving as the key technique to draw out complex reasoning. However, key technical details of state-of-the-art reasoning LLMs, such as those in the OpenAI O series, Claude 3 series, DeepMind's Gemini 2.5 series, and Grok 3 series, remain undisclosed, making it difficult for the research community to replicate their reinforcement learning training results. Therefore, we start our study from an Early Preview Reinforcement Learning (EPRLI) algorithm built on the open-source GRPO framework, incorporating difficulty-aware intervention for math problems. Applied to a 1.5B-parameter LLM, our method achieves 50.0% on AIME24, 89.2% on Math500, 77.1% on AMC, 35.3% on Minerva, and 51.9% on OBench, superpass O1-Preview and is comparable to O1-mini within standard school-lab…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Topic Modeling · Multimodal Machine Learning Applications