Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models

Leonardo Ranaldi; Andr\`e Freitas

arXiv:2405.00402·cs.CL·January 28, 2025·1 cites

Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models

Leonardo Ranaldi, Andr\`e Freitas

PDF

Open Access

TL;DR

This paper introduces a Self-refine Instruction-tuning method that enhances the reasoning abilities of smaller language models by enabling them to self-improve through preference optimization, leading to better alignment with larger models.

Contribution

The paper proposes a novel two-stage Self-refine Instruction-tuning approach that improves reasoning ability transfer and self-refinement in smaller language models.

Findings

01

Significant performance improvements on commonsense reasoning tasks.

02

Enhanced generalization in out-of-domain scenarios.

03

Outperforms traditional instruction-tuning methods.

Abstract

The alignments of reasoning abilities between smaller and larger Language Models are largely conducted via Supervised Fine-Tuning (SFT) using demonstrations generated from robust Large Language Models (LLMs). Although these approaches deliver more performant models, they do not show sufficiently strong generalization ability as the training only relies on the provided demonstrations. In this paper, we propose the Self-refine Instruction-tuning method that elicits Smaller Language Models to self-refine their abilities. Our approach is based on a two-stage process, where reasoning abilities are first transferred between LLMs and Small Language Models (SLMs) via Instruction-tuning on demonstrations provided by LLMs, and then the instructed models Self-refine their abilities through preference optimization strategies. In particular, the second phase operates refinement heuristics based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems