DDK: Distilling Domain Knowledge for Efficient Large Language Models

Jiaheng Liu; Chenchen Zhang; Jinyang Guo; Yuanxing Zhang; Haoran Que,; Ken Deng; Zhiqi Bai; Jie Liu; Ge Zhang; Jiakai Wang; Yanan Wu; Congnan Liu,; Wenbo Su; Jiamang Wang; Lin Qu; Bo Zheng

arXiv:2407.16154·cs.CL·July 24, 2024

DDK: Distilling Domain Knowledge for Efficient Large Language Models

Jiaheng Liu, Chenchen Zhang, Jinyang Guo, Yuanxing Zhang, Haoran Que,, Ken Deng, Zhiqi Bai, Jie Liu, Ge Zhang, Jiakai Wang, Yanan Wu, Congnan Liu,, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng

PDF

TL;DR

This paper introduces DDK, a dynamic knowledge distillation framework that adjusts dataset composition based on domain performance gaps, significantly enhancing the efficiency and effectiveness of large language model compression.

Contribution

The paper proposes a novel domain-aware distillation method that adaptively balances domain data, improving student LLM performance over existing static or black-box approaches.

Findings

01

DDK outperforms baseline models in multiple benchmarks.

02

Dynamic dataset adjustment improves distillation stability.

03

Significant performance gains over existing methods.

Abstract

Despite the advanced intelligence abilities of large language models (LLMs) in various applications, they still face significant computational and storage demands. Knowledge Distillation (KD) has emerged as an effective strategy to improve the performance of a smaller LLM (i.e., the student model) by transferring knowledge from a high-performing LLM (i.e., the teacher model). Prevailing techniques in LLM distillation typically use a black-box model API to generate high-quality pretrained and aligned datasets, or utilize white-box distillation by altering the loss function to better transfer knowledge from the teacher LLM. However, these methods ignore the knowledge differences between the student and teacher LLMs across domains. This results in excessive focus on domains with minimal performance gaps and insufficient attention to domains with large gaps, reducing overall performance. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need · Focus · Knowledge Distillation