rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection

Sijia Chen; Baochun Li; Di Niu

arXiv:2512.08300·cs.AI·December 10, 2025

rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection

Sijia Chen, Baochun Li, Di Niu

PDF

Open Access

TL;DR

This paper introduces rSIM, a reinforcement learning-based method that employs a small planner to inject reasoning strategies into large language models, significantly enhancing their reasoning abilities and generalizability across tasks.

Contribution

The paper proposes a novel reinforcement learning framework with a small planner to guide LLMs' reasoning strategies, enabling improved reasoning performance and transferability.

Findings

01

rSIM enables smaller LLMs to outperform larger models in reasoning tasks

02

The trained planner can be reused across different models and tasks

03

The approach supports continual learning and strategy improvement

Abstract

Large language models (LLMs) are post-trained through reinforcement learning (RL) to evolve into Reasoning Language Models (RLMs), where the hallmark of this advanced reasoning is ``aha'' moments when they start to perform strategies, such as self-reflection and deep thinking, within chain of thoughts (CoTs). Motivated by this, this paper proposes a novel reinforced strategy injection mechanism (rSIM), that enables any LLM to become an RLM by employing a small planner to guide the LLM's CoT through the adaptive injection of reasoning strategies. To achieve this, the planner (leader agent) is jointly trained with an LLM (follower agent) using multi-agent RL (MARL), based on a leader-follower framework and straightforward rule-based rewards. Experimental results show that rSIM enables Qwen2.5-0.5B to become an RLM and significantly outperform Qwen2.5-14B. Moreover, the planner is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning in Healthcare