Enhancing the Reasoning Capabilities of Small Language Models via   Solution Guidance Fine-Tuning

Jing Bi; Yuting Wu; Weiwei Xing; Zhenjie Wei

arXiv:2412.09906·cs.CL·December 16, 2024

Enhancing the Reasoning Capabilities of Small Language Models via Solution Guidance Fine-Tuning

Jing Bi, Yuting Wu, Weiwei Xing, Zhenjie Wei

PDF

1 Repo

TL;DR

This paper introduces Solution Guidance Fine-Tuning (SGFT), a novel approach that enhances small language models' reasoning abilities by training them to generate problem-solving guidance, improving performance on reasoning tasks with minimal data.

Contribution

The paper proposes a new reasoning strategy and a plug-and-play fine-tuning paradigm that significantly boosts small language models' reasoning capabilities using limited training data.

Findings

01

SGFT improves reasoning accuracy of small models

02

Method enables flexible, prompt-based problem solving

03

Significant performance gains on reasoning benchmarks

Abstract

Large language models (LLMs) have demonstrated remarkable performance across a wide range of tasks. Advances in prompt engineering and fine-tuning techniques have further enhanced their ability to address complex reasoning challenges. However, these advanced capabilities are often exclusive to models exceeding 100 billion parameters. Although Chain-of-Thought (CoT) fine-tuning methods have been explored for smaller models (under 10 billion parameters), they typically depend on extensive CoT training data, which can introduce inconsistencies and limit effectiveness in low-data settings. To overcome these limitations, this paper introduce a new reasoning strategy Solution Guidance (SG) and a plug-and-play training paradigm Solution-Guidance Fine-Tuning (SGFT) for enhancing the reasoning capabilities of small language models. SG focuses on problem understanding and decomposition at the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bijings/sgft
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.