TemplateRL: Structured Template-Guided Reinforcement Learning for LLM Reasoning

Jinyang Wu; Chonghua Liao; Mingkuan Feng; Shuai Zhang; Zhengqi Wen; Haoran Luo; Ling Yang; Huazhe Xu; Jianhua Tao

arXiv:2505.15692·cs.CL·May 18, 2026

TemplateRL: Structured Template-Guided Reinforcement Learning for LLM Reasoning

Jinyang Wu, Chonghua Liao, Mingkuan Feng, Shuai Zhang, Zhengqi Wen, Haoran Luo, Ling Yang, Huazhe Xu, Jianhua Tao

PDF

TL;DR

TemplateRL introduces a structured, template-guided reinforcement learning framework that improves reasoning efficiency and transferability by integrating explicit problem-solving templates into policy training.

Contribution

It presents a novel structured RL approach using a template library built via MCTS, enhancing training stability, efficiency, and interpretability over existing unstructured methods.

Findings

01

TemplateRL outperforms GRPO by 99% on AIME.

02

TemplateRL outperforms GRPO by 41% on AMC.

03

TemplateRL demonstrates superior stability and cross-domain generalization.

Abstract

Reinforcement learning (RL) has emerged as an effective paradigm for enhancing model reasoning. However, existing RL methods like GRPO typically rely on unstructured self-sampling to fit scalar rewards, often producing inefficient rollouts that fail to capture transferable problem-solving strategies. To address this limitation, we propose **TemplateRL**, a structured template-guided RL framework that augments policy optimization with explicit template guidance. Our approach first constructs a problem-solving template library via MCTS on a small seed set, then seamlessly integrates this high-level structured guidance into RL training. By guiding rollout generation to align with proven template structures, TemplateRL significantly improves high-quality trajectory hit rates while reducing ineffective exploration. This structure-guided design steers the policy toward validated strategic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Systems and Decision Making

MethodsBalanced Selection