Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning
Yuanhao Yue, Chengyu Wang, Jun Huang, Peng Wang

TL;DR
This paper introduces TAPIR, a curriculum-based distillation method that improves large language models' instruction-following abilities by systematically selecting and balancing task difficulties, leading to superior performance with less data.
Contribution
The paper proposes TAPIR, a novel multi-round distillation framework that incorporates task-aware curriculum planning and automatic response refinement to enhance LLM instruction tuning.
Findings
Student LLMs outperform larger models on benchmarks.
TAPIR achieves better results with less training data.
Curriculum planning systematically improves instruction-following abilities.
Abstract
Instruction tuning aims to align large language models (LLMs) with open-domain instructions and human-preferred responses. While several studies have explored autonomous approaches to distilling and annotating instructions from powerful proprietary LLMs, such as ChatGPT, they often neglect the impact of the distributions and characteristics of tasks, together with the varying difficulty of instructions in training sets. This oversight can lead to imbalanced knowledge capabilities and poor generalization powers of student LLMs. To address these challenges, we introduce Task-Aware Curriculum Planning for Instruction Refinement (TAPIR), a multi-round distillation framework that utilizes an oracle LLM to select instructions that are difficult for a student LLM to follow. To balance the student's capabilities, task distributions in training sets are adjusted with responses automatically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline Learning and Analytics · Intelligent Tutoring Systems and Adaptive Learning · Natural Language Processing Techniques
MethodsALIGN
