TPD: Enhancing Student Language Model Reasoning via Principle Discovery   and Guidance

Haorui Wang (1); Rongzhi Zhang (1); Yinghao Li (1); Lingkai Kong (1),; Yuchen Zhuang (1); Xiusi Chen (2); Chao Zhang (1) ((1) College of Computing,; Georgia Institute of Technology; (2) Department of Computer Science,; University of California; Los Angeles)

arXiv:2401.13849·cs.CL·January 26, 2024·1 cites

TPD: Enhancing Student Language Model Reasoning via Principle Discovery and Guidance

Haorui Wang (1), Rongzhi Zhang (1), Yinghao Li (1), Lingkai Kong (1),, Yuchen Zhuang (1), Xiusi Chen (2), Chao Zhang (1) ((1) College of Computing,, Georgia Institute of Technology, (2) Department of Computer Science,, University of California, Los Angeles)

PDF

Open Access

TL;DR

The paper introduces TPD, a principle-based teacher-student framework that enhances smaller language models' reasoning by mimicking human learning, leading to significant performance improvements without ongoing teacher intervention.

Contribution

The paper proposes a novel principle discovery-based teaching framework that improves student LLM reasoning without continuous teacher guidance or extensive fine-tuning.

Findings

01

TPD achieves a 6.2% average performance boost over chain-of-thought prompting.

02

The framework effectively guides student models using error-based principles.

03

Extensive experiments across eight reasoning tasks validate TPD's effectiveness.

Abstract

Large Language Models (LLMs) have recently showcased remarkable reasoning abilities. However, larger models often surpass their smaller counterparts in reasoning tasks, posing the challenge of effectively transferring these capabilities from larger models. Existing approaches heavily rely on extensive fine-tuning data or continuous interactions with a superior teacher LLM during inference. We introduce a principle-based teacher-student framework called ``Teaching via Principle Discovery'' (TPD) to address these limitations. Inspired by human learning mechanisms, TPD mimics the interaction between a teacher and a student using a principle-based approach. The teacher LLM generates problem-solving instructions and corrective principles based on the student LLM's errors. These principles guide the refinement of instructions and the selection of instructive examples from a validation set.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research