Learning with Less: Knowledge Distillation from Large Language Models   via Unlabeled Data

Juanhui Li; Sreyashi Nag; Hui Liu; Xianfeng Tang; Sheikh Sarwar,; Limeng Cui; Hansu Gu; Suhang Wang; Qi He; Jiliang Tang

arXiv:2411.08028·cs.AI·April 1, 2025

Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data

Juanhui Li, Sreyashi Nag, Hui Liu, Xianfeng Tang, Sheikh Sarwar,, Limeng Cui, Hansu Gu, Suhang Wang, Qi He, Jiliang Tang

PDF

Open Access 1 Video

TL;DR

This paper introduces LLKD, a method for efficient knowledge distillation from large language models to smaller models using unlabeled data, focusing on adaptive sample selection to improve data efficiency and model performance.

Contribution

The paper proposes LLKD, an adaptive sample selection technique that enhances knowledge distillation from LLMs by prioritizing high-confidence and informative samples, reducing data and computational requirements.

Findings

01

LLKD outperforms baseline methods in multiple NLP tasks.

02

It achieves higher data efficiency with fewer labeled samples.

03

The method improves smaller model performance while reducing resource usage.

Abstract

In real-world NLP applications, Large Language Models (LLMs) offer promising solutions due to their extensive training on vast datasets. However, the large size and high computation demands of LLMs limit their practicality in many applications, especially when further fine-tuning is required. To address these limitations, smaller models are typically preferred for deployment. However, their training is hindered by the scarcity of labeled data. In contrast, unlabeled data is often readily which can be leveraged by using LLMs to generate pseudo-labels for training smaller models. This enables the smaller models (student) to acquire knowledge from LLMs(teacher) while reducing computational costs. This process introduces challenges, such as potential noisy pseudo-labels. Selecting high-quality and informative data is therefore critical to enhance model performance while improving the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsKnowledge Distillation