HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression
Chenhe Dong, Yaliang Li, Ying Shen, Minghui Qiu

TL;DR
This paper introduces HRKD, a hierarchical relational knowledge distillation method that compresses large language models by capturing hierarchical and domain relational information, improving transferability and performance on resource-limited devices.
Contribution
The paper proposes a novel HRKD approach that combines meta-learning, domain-relational graphs, and hierarchical compare-aggregate mechanisms for effective language model compression.
Findings
HRKD outperforms existing methods on multi-domain datasets.
HRKD demonstrates strong few-shot learning capabilities.
The approach effectively captures hierarchical and domain relational information.
Abstract
On many natural language processing tasks, large pre-trained language models (PLMs) have shown overwhelming performances compared with traditional neural network methods. Nevertheless, their huge model size and low inference speed have hindered the deployment on resource-limited devices in practice. In this paper, we target to compress PLMs with knowledge distillation, and propose a hierarchical relational knowledge distillation (HRKD) method to capture both hierarchical and domain relational information. Specifically, to enhance the model capability and transferability, we leverage the idea of meta-learning and set up domain-relational graphs to capture the relational information across different domains. And to dynamically select the most representative prototypes for each domain, we propose a hierarchical compare-aggregate mechanism to capture hierarchical relationships. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Knowledge Distillation
