Smaller Language Models Are Better Instruction Evolvers
Tingfeng Hui, Lulu Zhao, Guanting Dong, Yaqi Zhang, Hua, Zhou, Sen Su

TL;DR
This paper demonstrates that smaller language models can be more effective than larger ones in instruction evolution, producing more diverse and complex instructions, challenging the assumption that bigger models are inherently better for this task.
Contribution
The study reveals that smaller language models outperform larger ones in instruction synthesis and introduces a new metric, IC-IFD, to better evaluate instruction complexity and effectiveness.
Findings
Smaller models generate more effective instructions.
SLMs have a broader output space during instruction evolution.
Existing metrics do not adequately measure instruction impact.
Abstract
Instruction tuning has been widely used to unleash the complete potential of large language models. Notably, complex and diverse instructions are of significant importance as they can effectively align models with various downstream tasks. However, current approaches to constructing large-scale instructions predominantly favour powerful models such as GPT-4 or those with over 70 billion parameters, under the empirical presumption that such larger language models (LLMs) inherently possess enhanced capabilities. In this study, we question this prevalent assumption and conduct an in-depth exploration into the potential of smaller language models (SLMs) in the context of instruction evolution. Extensive experiments across three scenarios of instruction evolution reveal that smaller language models (SLMs) can synthesize more effective instructions than LLMs. Further analysis demonstrates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Intelligent Tutoring Systems and Adaptive Learning
MethodsAttention Is All You Need · Linear Layer · Dropout · Dense Connections · Byte Pair Encoding · Multi-Head Attention · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Label Smoothing
