Smaller Language Models Are Better Instruction Evolvers

Tingfeng Hui; Lulu Zhao; Guanting Dong; Yaqi Zhang; Hua; Zhou; Sen Su

arXiv:2412.11231·cs.CL·December 17, 2024

Smaller Language Models Are Better Instruction Evolvers

Tingfeng Hui, Lulu Zhao, Guanting Dong, Yaqi Zhang, Hua, Zhou, Sen Su

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that smaller language models can be more effective than larger ones in instruction evolution, producing more diverse and complex instructions, challenging the assumption that bigger models are inherently better for this task.

Contribution

The study reveals that smaller language models outperform larger ones in instruction synthesis and introduces a new metric, IC-IFD, to better evaluate instruction complexity and effectiveness.

Findings

01

Smaller models generate more effective instructions.

02

SLMs have a broader output space during instruction evolution.

03

Existing metrics do not adequately measure instruction impact.

Abstract

Instruction tuning has been widely used to unleash the complete potential of large language models. Notably, complex and diverse instructions are of significant importance as they can effectively align models with various downstream tasks. However, current approaches to constructing large-scale instructions predominantly favour powerful models such as GPT-4 or those with over 70 billion parameters, under the empirical presumption that such larger language models (LLMs) inherently possess enhanced capabilities. In this study, we question this prevalent assumption and conduct an in-depth exploration into the potential of smaller language models (SLMs) in the context of instruction evolution. Extensive experiments across three scenarios of instruction evolution reveal that smaller language models (SLMs) can synthesize more effective instructions than LLMs. Further analysis demonstrates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hypherx/evolution-analysis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Intelligent Tutoring Systems and Adaptive Learning

MethodsAttention Is All You Need · Linear Layer · Dropout · Dense Connections · Byte Pair Encoding · Multi-Head Attention · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Label Smoothing