Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation
Lingyun Feng, Minghui Qiu, Yaliang Li, Hai-Tao Zheng, Ying Shen

TL;DR
This paper introduces a novel data augmentation approach for domain-specific BERT knowledge distillation, improving student model performance in data-scarce scenarios by leveraging source domain data and reinforcement learning.
Contribution
It proposes a cross-domain augmentation method with a reinforced selector to enhance knowledge transfer in data-scarce domain distillation tasks.
Findings
Significantly outperforms state-of-the-art baselines on four tasks.
Student models outperform large teachers with fewer parameters.
Effective in scenarios with limited labeled data.
Abstract
Despite pre-trained language models such as BERT have achieved appealing performance in a wide range of natural language processing tasks, they are computationally expensive to be deployed in real-time applications. A typical method is to adopt knowledge distillation to compress these large pre-trained models (teacher models) to small student models. However, for a target domain with scarce training data, the teacher can hardly pass useful knowledge to the student, which yields performance degradation for the student models. To tackle this problem, we propose a method to learn to augment for data-scarce domain BERT knowledge distillation, by learning a cross-domain manipulation scheme that automatically augments the target with the help of resource-rich source domains. Specifically, the proposed method generates samples acquired from a stationary distribution near the target data and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Knowledge Distillation · Dense Connections · Residual Connection · Adam · Linear Warmup With Linear Decay · Dropout · Softmax · Multi-Head Attention · Refunds@Expedia|||How do I get a full refund from Expedia?
