CITING: Large Language Models Create Curriculum for Instruction Tuning

Tao Feng; Zifeng Wang; Jimeng Sun

arXiv:2310.02527·cs.CL·October 5, 2023

CITING: Large Language Models Create Curriculum for Instruction Tuning

Tao Feng, Zifeng Wang, Jimeng Sun

PDF

Open Access 3 Reviews

TL;DR

CITING introduces a novel approach where large language models generate curricula for instruction tuning, replacing human effort and significantly improving model performance across multiple datasets.

Contribution

The paper presents a new method where a teacher LLM creates rubrics and guides self-correction in student LLMs, enhancing instruction tuning without human-crafted datasets.

Findings

01

Achieves an average winning rate of 79.4% over SFT.

02

Demonstrates strong improvement in answer quality.

03

Outperforms several state-of-the-art baselines.

Abstract

The recent advancement of large language models (LLMs) has been achieved through a combo of instruction tuning and human alignment. However, building manually crafted instruction datasets and performing human alignment become the bottleneck for scaling the development of LLMs. In this paper, we exploit the idea of leveraging AI models in lieu of humans as the teacher to train student LLMs. Our method is inspired by how human students refine their writing skills by following the rubrics and learning from the revisions offered by their tutors. Specifically, we employ a teacher LLM to create a curriculum for instruction tuning of the student LLM, namely Curriculum Instruction TunING (CITING). It encompasses two main steps: (1) the teacher LLM crafts the rubrics for evaluating the answers corresponding to various types of questions, and (2) the student LLM learns to follow the rubrics and…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

1. The Curriculum Instruction TunING approach is innovative. Using teacher LLMs to guide student LLMs, which mirrors the tutor-student relationship, is a fresh perspective in this field. 2. The paper delineates a meticulously crafted methodology, ranging from rubric design with the teacher model to the iterative fine-tuning of the student LLM. 3. The narrative is lucid, providing a thorough explanation of the CITING methodology.

Weaknesses

1. Over-reliance on Teacher LLM: There's a potential risk if the teacher LLM possesses biases or inaccuracies, as it could transfer these shortcomings to the student LLM. Consequently, the effectiveness of CITING is largely contingent on the quality and resilience of the teacher LLM. 2. Test Phase Limitations: During the test phase, the model's potential might be constrained by the extent of criteria it can retrieve from a fixed corpus. 3. Evaluation Metrics: The paper predominantly emphasizes

Reviewer 02Rating 3· reject, not good enoughConfidence 5

Strengths

**Clarity** - The presentation of this work is clear and easy to follow. - The method is simple and effective and has shown clear improvement over baselines.

Weaknesses

My main concern about this work is about novelty and clarity. Even though the method proposes using criteria as a guidance to augment instruction tuning data, the overall method, still, can be viewed as a complex version of data distillation. Recently there have been tons of works proposing pretty similar ideas, such as Orca [1], WizardLM [2], MAmmoTH [3], etc., which all leverage data augmentation (guided by certain criteria/score function, etc.). It might be good to compare CITING with these m

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

1. This paper proposes a novel method to train large language models using AI feedback instead of human feedback, which reduces the cost and difficulty of scaling LLM development. 2. This paper introduces curriculum instruction tuning, which leverages a teacher LLM to create rubrics and revisions for different types of instructions, and a student LLM to learn from them. This is an interesting use case of LLM as a planner for training another LM. 3. This shows that CITING outperforms existing met

Weaknesses

1. The technical novelty may be limited. The method is also complicated. 2. This paper does not evaluate the robustness or generalization of CITING to unseen or adversarial instructions. This could be an issue as the teacher model only teaches in-domain curriculums. I'd like to see discussion on this.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsMulti-Head Attention · Attention Is All You Need · Dropout · Dense Connections · Linear Layer · Label Smoothing · Adam · Absolute Position Encodings · Residual Connection · Layer Normalization