ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
Ling Yang, Zhaochen Yu, Bin Cui, Mengdi Wang

TL;DR
ReasonFlux introduces hierarchical thought templates and reinforcement learning to significantly enhance large language models' mathematical reasoning, achieving state-of-the-art accuracy on benchmarks with efficient training.
Contribution
The paper proposes a novel hierarchical reasoning framework with a structured template library, reinforcement learning for template planning, and an inference scaling system, improving LLM math reasoning.
Findings
Achieves 91.2% accuracy on MATH benchmark.
Solves 56.7% of AIME problems, surpassing prior models.
Uses only 8 GPUs for training ReasonFlux-32B.
Abstract
We present that hierarchical LLM reasoning via scaling thought templates can effectively optimize the reasoning search space and outperform the mathematical reasoning capabilities of powerful LLMs like OpenAI o1-preview and DeepSeek V3. We train our ReasonFlux-32B model with only 8 GPUs and introduces three innovations: (i) a structured and generic thought template library, containing around 500 high-level thought templates capable of generalizing to similar or relevant reasoning problems; (ii) performing hierarchical reinforcement learning on a sequence of thought templates instead of long CoTs, optimizing a base LLM to plan out an optimal template trajectory for gradually handling complex problems; (iii) a brand new inference scaling system that enables hierarchical LLM reasoning by adaptively scaling thought templates at inference time. With a template trajectory containing more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques
MethodsBalanced Selection
