Stronger Models are NOT Stronger Teachers for Instruction Tuning
Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Radha, Poovendran

TL;DR
This paper challenges the assumption that larger models are better teachers for instruction tuning, revealing a paradox where bigger models do not always improve smaller ones, and introduces a new metric to measure teacher effectiveness.
Contribution
The study uncovers the Larger Models' Paradox and proposes the Compatibility-Adjusted Reward (CAR) metric to better evaluate response generators for instruction tuning.
Findings
Larger models are not necessarily better teachers for smaller models.
Existing metrics fail to predict teacher effectiveness due to ignoring model compatibility.
CAR metric outperforms baselines in measuring response generator effectiveness.
Abstract
Instruction tuning has been widely adopted to ensure large language models (LLMs) follow user instructions effectively. The resulting instruction-following capabilities of LLMs heavily rely on the instruction datasets used for tuning. Recently, synthetic instruction datasets have emerged as an economically viable solution to provide LLMs diverse and high-quality instructions. However, existing approaches typically assume that larger or stronger models are stronger teachers for instruction tuning, and hence simply adopt these models as response generators to the synthetic instructions. In this paper, we challenge this commonly-adopted assumption. Our extensive experiments across five base models and twenty response generators reveal that larger and stronger models are not necessarily stronger teachers of smaller models. We refer to this phenomenon as the Larger Models' Paradox. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Methods and Technology · Intelligent Tutoring Systems and Adaptive Learning
MethodsBalanced Selection · ADaptive gradient method with the OPTimal convergence rate
