HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain   Language Model Compression

Chenhe Dong; Yaliang Li; Ying Shen; Minghui Qiu

arXiv:2110.08551·cs.CL·October 19, 2021

HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression

Chenhe Dong, Yaliang Li, Ying Shen, Minghui Qiu

PDF

Open Access 1 Repo

TL;DR

This paper introduces HRKD, a hierarchical relational knowledge distillation method that compresses large language models by capturing hierarchical and domain relational information, improving transferability and performance on resource-limited devices.

Contribution

The paper proposes a novel HRKD approach that combines meta-learning, domain-relational graphs, and hierarchical compare-aggregate mechanisms for effective language model compression.

Findings

01

HRKD outperforms existing methods on multi-domain datasets.

02

HRKD demonstrates strong few-shot learning capabilities.

03

The approach effectively captures hierarchical and domain relational information.

Abstract

On many natural language processing tasks, large pre-trained language models (PLMs) have shown overwhelming performances compared with traditional neural network methods. Nevertheless, their huge model size and low inference speed have hindered the deployment on resource-limited devices in practice. In this paper, we target to compress PLMs with knowledge distillation, and propose a hierarchical relational knowledge distillation (HRKD) method to capture both hierarchical and domain relational information. Specifically, to enhance the model capability and transferability, we leverage the idea of meta-learning and set up domain-relational graphs to capture the relational information across different domains. And to dynamically select the most representative prototypes for each domain, we propose a hierarchical compare-aggregate mechanism to capture hierarchical relationships. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cheneydon/hrkd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Knowledge Distillation