Large Language Models are In-context Teachers for Knowledge Reasoning
Jiachen Zhao, Zonghai Yao, Zhichao Yang, Hong Yu

TL;DR
This paper explores using large language models as in-context teachers for reasoning tasks, demonstrating that self-generated explanations and alignment techniques significantly improve performance over human-crafted examples.
Contribution
It introduces the Self-Explain and Teach-Back methods, showing how LLMs can effectively teach themselves and others, surpassing human teachers in in-context learning accuracy.
Findings
Self-Explain outperforms human-crafted exemplars.
Matching teacher and student explanations improves in-context teaching.
Teach-Back enhances performance, e.g., 7B model surpasses GPT-3.5 in medical QA.
Abstract
In this work, we study in-context teaching (ICT), where a teacher provides in-context example rationales to teach a student to reason over unseen cases. Human teachers are usually required to craft in-context demonstrations, which are costly and have high variance. We ask whether a large language model (LLM) can serve as a more effective in-context teacher for itself or other LLMs, compared to humans. Inspired by the Encoding Specificity Hypothesis from human episodic memory, we hypothesize that in-context exemplars crafted by the teacher should match the training data of the student. This hypothesis motivates us to propose Self-Explain where an LLM's self-elicited explanations are used as in-context demonstrations for prompting it as they are generalized from the model's training examples. Self-Explain is shown to significantly outperform using human-crafted exemplars and other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Adam · Attention Dropout · Weight Decay
