SeCoKD: Aligning Large Language Models for In-Context Learning with   Fewer Shots

Weixing Wang; Haojin Yang; Christoph Meinel

arXiv:2406.14208·cs.AI·September 27, 2024

SeCoKD: Aligning Large Language Models for In-Context Learning with Fewer Shots

Weixing Wang, Haojin Yang, Christoph Meinel

PDF

Open Access

TL;DR

SeCoKD is a training framework that enhances large language models' in-context learning ability with fewer demonstrations by aligning models through self-distillation, leading to improved performance especially in low-shot settings.

Contribution

We introduce SeCoKD, a novel self-distillation method that reduces the number of demonstrations needed for effective in-context learning in large language models.

Findings

01

Outperforms base models and SFT in zero-shot and one-shot settings.

02

Increases utilization of single demonstration.

03

More robust on new tasks with minimal negative artifacts.

Abstract

Previous studies have shown that demonstrations can significantly help Large Language Models (LLMs ) perform better on the given tasks. However, this so-called In-Context Learning ( ICL ) ability is very sensitive to the presenting context, and often dozens of demonstrations are needed. In this work, we investigate if we can reduce the shot number while still maintaining a competitive performance. We present SeCoKD, a self-Knowledge Distillation ( KD ) training framework that aligns the student model with a heavily prompted variation, thereby increasing the utilization of a single demonstration. We experiment with the SeCoKD across three LLMs and six benchmarks focusing mainly on reasoning tasks. Results show that our method outperforms the base model and Supervised Fine-tuning ( SFT ), especially in zero-shot and one-shot settings by 30% and 10%, respectively. Moreover, SeCoKD brings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Anomaly Detection Techniques and Applications

MethodsBalanced Selection · Shrink and Fine-Tune