SMART: Self-learning Meta-strategy Agent for Reasoning Tasks

Rongxing Liu; Kumar Shridhar; Manish Prajapat; Patrick Xia; Mrinmaya; Sachan

arXiv:2410.16128·cs.AI·October 22, 2024

SMART: Self-learning Meta-strategy Agent for Reasoning Tasks

Rongxing Liu, Kumar Shridhar, Manish Prajapat, Patrick Xia, Mrinmaya, Sachan

PDF

Open Access 1 Repo 3 Reviews

TL;DR

SMART is a framework that enables language models to autonomously learn and select the most effective reasoning strategies on the first attempt, improving accuracy and efficiency without external feedback.

Contribution

We introduce SMART, a reinforcement learning-based approach that allows LMs to internalize and optimize their reasoning strategies for better first-attempt accuracy.

Findings

01

Significantly improves reasoning accuracy (+15 points on GSM8K)

02

Reduces need for multiple inference passes

03

Enhances efficiency in reasoning tasks

Abstract

Tasks requiring deductive reasoning, especially those involving multiple steps, often demand adaptive strategies such as intermediate generation of rationales or programs, as no single approach is universally optimal. While Language Models (LMs) can enhance their outputs through iterative self-refinement and strategy adjustments, they frequently fail to apply the most effective strategy in their first attempt. This inefficiency raises the question: Can LMs learn to select the optimal strategy in the first attempt, without a need for refinement? To address this challenge, we introduce SMART (Self-learning Meta-strategy Agent for Reasoning Tasks), a novel framework that enables LMs to autonomously learn and select the most effective strategies for various reasoning tasks. We model the strategy selection process as a Markov Decision Process and leverage reinforcement learning-driven…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 4

Strengths

1. SMART provides a novel framework by modeling strategy selection in LMs as a Markov Decision Process, adding a unique layer of self-learning. 2. The paper demonstrates high-quality experimentation, validating SMART’s effectiveness across multiple datasets and model architectures, such as GSM8K, SVAMP, and ASDiv. 3. The paper is clearly articulated, particularly in its two-stage process design, making the approach easy to understand and apply. 4. By enabling LMs to select optimal strategies wi

Weaknesses

1. The method is overly complex without sufficient justification for why simpler methods cannot achieve similar results. There should be a more detailed baseline comparison. 2. The experimental section lacks clarity, particularly regarding the setup and specific configurations for each dataset, making the results difficult to interpret and replicate. Besides, the paper's discussion on the generalization capability of SMART across different datasets is weak, as it does not convincingly demonstra

Reviewer 02Rating 3Confidence 3

Strengths

1. **Relevance**: The problem of efficient self-training is important for improving post-training capabilities of large language models (LLMs). A successful self-training method can enhance LLM performance on complex tasks, such as multi-step reasoning, making the research direction highly relevant. 2. **Clarity**: The overall presentation of SMART, including the RL framework for strategy selection, is clearly explained, and the experimental methodology is easy to follow. 3. **Diverse Empirical

Weaknesses

1. **Limited Novelty**: The core idea of SMART resembles existing self-training methods that utilize rejection sampling (https://arxiv.org/pdf/2308.01825). The concept of generating correct samples through self-reflection has already been widely explored, such as [ReST](https://arxiv.org/pdf/2308.08998) (Gulcehre et al., 2023) and [Re-ReST](https://arxiv.org/pdf/2406.01495), with the only new aspect being the emphasis on strategy selection rather than general solution generation. This difference

Reviewer 03Rating 5Confidence 3

Strengths

see questions

Weaknesses

see questions

Code & Models

Repositories

kumar-shridhar/smart
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Multi-Agent Systems and Negotiation · Fuzzy Logic and Control Systems