Stronger Models are NOT Stronger Teachers for Instruction Tuning

Zhangchen Xu; Fengqing Jiang; Luyao Niu; Bill Yuchen Lin; Radha; Poovendran

arXiv:2411.07133·cs.AI·February 27, 2025

Stronger Models are NOT Stronger Teachers for Instruction Tuning

Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Radha, Poovendran

PDF

Open Access 4 Models 3 Datasets

TL;DR

This paper challenges the assumption that larger models are better teachers for instruction tuning, revealing a paradox where bigger models do not always improve smaller ones, and introduces a new metric to measure teacher effectiveness.

Contribution

The study uncovers the Larger Models' Paradox and proposes the Compatibility-Adjusted Reward (CAR) metric to better evaluate response generators for instruction tuning.

Findings

01

Larger models are not necessarily better teachers for smaller models.

02

Existing metrics fail to predict teacher effectiveness due to ignoring model compatibility.

03

CAR metric outperforms baselines in measuring response generator effectiveness.

Abstract

Instruction tuning has been widely adopted to ensure large language models (LLMs) follow user instructions effectively. The resulting instruction-following capabilities of LLMs heavily rely on the instruction datasets used for tuning. Recently, synthetic instruction datasets have emerged as an economically viable solution to provide LLMs diverse and high-quality instructions. However, existing approaches typically assume that larger or stronger models are stronger teachers for instruction tuning, and hence simply adopt these models as response generators to the synthetic instructions. In this paper, we challenge this commonly-adopted assumption. Our extensive experiments across five base models and twenty response generators reveal that larger and stronger models are not necessarily stronger teachers of smaller models. We refer to this phenomenon as the Larger Models' Paradox. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEducational Methods and Technology · Intelligent Tutoring Systems and Adaptive Learning

MethodsBalanced Selection · ADaptive gradient method with the OPTimal convergence rate