Performance-Guided LLM Knowledge Distillation for Efficient Text Classification at Scale
Flavio Di Palo, Prateek Singhi, Bilal Fadlallah

TL;DR
This paper introduces Performance-Guided Knowledge Distillation (PGKD), a cost-effective method that distills large language models into smaller, efficient models for text classification, significantly reducing inference costs and latency.
Contribution
The paper presents a novel, performance-aware active learning framework for LLM knowledge distillation tailored for multi-class, sparsely annotated datasets, outperforming traditional methods.
Findings
PGKD models are up to 130X faster than LLMs.
PGKD reduces inference costs by up to 25X.
Outperforms traditional BERT-base and other distillation methods.
Abstract
Large Language Models (LLMs) face significant challenges at inference time due to their high computational demands. To address this, we present Performance-Guided Knowledge Distillation (PGKD), a cost-effective and high-throughput solution for production text classification applications. PGKD utilizes teacher-student Knowledge Distillation to distill the knowledge of LLMs into smaller, task-specific models. PGKD establishes an active learning routine between the student model and the LLM; the LLM continuously generates new training data leveraging hard-negative mining, student model validation performance, and early-stopping protocols to inform the data generation. By employing a cyclical, performance-aware approach tailored for highly multi-class, sparsely annotated datasets prevalent in industrial text classification, PGKD effectively addresses training challenges and outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsText and Document Classification Technologies
MethodsKnowledge Distillation
