Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation

Lingyun Feng; Minghui Qiu; Yaliang Li; Hai-Tao Zheng; Ying Shen

arXiv:2101.08106·cs.CL·June 22, 2021·1 cites

Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation

Lingyun Feng, Minghui Qiu, Yaliang Li, Hai-Tao Zheng, Ying Shen

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel data augmentation approach for domain-specific BERT knowledge distillation, improving student model performance in data-scarce scenarios by leveraging source domain data and reinforcement learning.

Contribution

It proposes a cross-domain augmentation method with a reinforced selector to enhance knowledge transfer in data-scarce domain distillation tasks.

Findings

01

Significantly outperforms state-of-the-art baselines on four tasks.

02

Student models outperform large teachers with fewer parameters.

03

Effective in scenarios with limited labeled data.

Abstract

Despite pre-trained language models such as BERT have achieved appealing performance in a wide range of natural language processing tasks, they are computationally expensive to be deployed in real-time applications. A typical method is to adopt knowledge distillation to compress these large pre-trained models (teacher models) to small student models. However, for a target domain with scarce training data, the teacher can hardly pass useful knowledge to the student, which yields performance degradation for the student models. To tackle this problem, we propose a method to learn to augment for data-scarce domain BERT knowledge distillation, by learning a cross-domain manipulation scheme that automatically augments the target with the help of resource-rich source domains. Specifically, the proposed method generates samples acquired from a stationary distribution near the target data and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Knowledge Distillation · Dense Connections · Residual Connection · Adam · Linear Warmup With Linear Decay · Dropout · Softmax · Multi-Head Attention · Refunds@Expedia|||How do I get a full refund from Expedia?