SMART: Self-learning Meta-strategy Agent for Reasoning Tasks
Rongxing Liu, Kumar Shridhar, Manish Prajapat, Patrick Xia, Mrinmaya, Sachan

TL;DR
SMART is a framework that enables language models to autonomously learn and select the most effective reasoning strategies on the first attempt, improving accuracy and efficiency without external feedback.
Contribution
We introduce SMART, a reinforcement learning-based approach that allows LMs to internalize and optimize their reasoning strategies for better first-attempt accuracy.
Findings
Significantly improves reasoning accuracy (+15 points on GSM8K)
Reduces need for multiple inference passes
Enhances efficiency in reasoning tasks
Abstract
Tasks requiring deductive reasoning, especially those involving multiple steps, often demand adaptive strategies such as intermediate generation of rationales or programs, as no single approach is universally optimal. While Language Models (LMs) can enhance their outputs through iterative self-refinement and strategy adjustments, they frequently fail to apply the most effective strategy in their first attempt. This inefficiency raises the question: Can LMs learn to select the optimal strategy in the first attempt, without a need for refinement? To address this challenge, we introduce SMART (Self-learning Meta-strategy Agent for Reasoning Tasks), a novel framework that enables LMs to autonomously learn and select the most effective strategies for various reasoning tasks. We model the strategy selection process as a Markov Decision Process and leverage reinforcement learning-driven…
Peer Reviews
Decision·Submitted to ICLR 2025
1. SMART provides a novel framework by modeling strategy selection in LMs as a Markov Decision Process, adding a unique layer of self-learning. 2. The paper demonstrates high-quality experimentation, validating SMART’s effectiveness across multiple datasets and model architectures, such as GSM8K, SVAMP, and ASDiv. 3. The paper is clearly articulated, particularly in its two-stage process design, making the approach easy to understand and apply. 4. By enabling LMs to select optimal strategies wi
1. The method is overly complex without sufficient justification for why simpler methods cannot achieve similar results. There should be a more detailed baseline comparison. 2. The experimental section lacks clarity, particularly regarding the setup and specific configurations for each dataset, making the results difficult to interpret and replicate. Besides, the paper's discussion on the generalization capability of SMART across different datasets is weak, as it does not convincingly demonstra
1. **Relevance**: The problem of efficient self-training is important for improving post-training capabilities of large language models (LLMs). A successful self-training method can enhance LLM performance on complex tasks, such as multi-step reasoning, making the research direction highly relevant. 2. **Clarity**: The overall presentation of SMART, including the RL framework for strategy selection, is clearly explained, and the experimental methodology is easy to follow. 3. **Diverse Empirical
1. **Limited Novelty**: The core idea of SMART resembles existing self-training methods that utilize rejection sampling (https://arxiv.org/pdf/2308.01825). The concept of generating correct samples through self-reflection has already been widely explored, such as [ReST](https://arxiv.org/pdf/2308.08998) (Gulcehre et al., 2023) and [Re-ReST](https://arxiv.org/pdf/2406.01495), with the only new aspect being the emphasis on strategy selection rather than general solution generation. This difference
see questions
see questions
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Multi-Agent Systems and Negotiation · Fuzzy Logic and Control Systems
