rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection
Sijia Chen, Baochun Li, Di Niu

TL;DR
This paper introduces rSIM, a reinforcement learning-based method that employs a small planner to inject reasoning strategies into large language models, significantly enhancing their reasoning abilities and generalizability across tasks.
Contribution
The paper proposes a novel reinforcement learning framework with a small planner to guide LLMs' reasoning strategies, enabling improved reasoning performance and transferability.
Findings
rSIM enables smaller LLMs to outperform larger models in reasoning tasks
The trained planner can be reused across different models and tasks
The approach supports continual learning and strategy improvement
Abstract
Large language models (LLMs) are post-trained through reinforcement learning (RL) to evolve into Reasoning Language Models (RLMs), where the hallmark of this advanced reasoning is ``aha'' moments when they start to perform strategies, such as self-reflection and deep thinking, within chain of thoughts (CoTs). Motivated by this, this paper proposes a novel reinforced strategy injection mechanism (rSIM), that enables any LLM to become an RLM by employing a small planner to guide the LLM's CoT through the adaptive injection of reasoning strategies. To achieve this, the planner (leader agent) is jointly trained with an LLM (follower agent) using multi-agent RL (MARL), based on a leader-follower framework and straightforward rule-based rewards. Experimental results show that rSIM enables Qwen2.5-0.5B to become an RLM and significantly outperform Qwen2.5-14B. Moreover, the planner is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning in Healthcare
