Adversarial Self-Supervised Data-Free Distillation for Text   Classification

Xinyin Ma; Yongliang Shen; Gongfan Fang; Chen Chen; Chenghao Jia,; Weiming Lu

arXiv:2010.04883·cs.CL·October 13, 2020·1 cites

Adversarial Self-Supervised Data-Free Distillation for Text Classification

Xinyin Ma, Yongliang Shen, Gongfan Fang, Chen Chen, Chenghao Jia,, Weiming Lu

PDF

Open Access

TL;DR

This paper introduces a novel data-free distillation method for NLP, enabling the compression of large transformer models without access to original data by using adversarial self-supervised pseudo embeddings.

Contribution

It proposes the first data-free distillation framework for NLP, utilizing adversarial self-supervision and pseudo embeddings to effectively compress models like BERT.

Findings

01

Achieves competitive performance on text classification datasets.

02

Demonstrates effectiveness without requiring original training data.

03

Introduces a Plug & Play Embedding Guessing method for pseudo data generation.

Abstract

Large pre-trained transformer-based language models have achieved impressive results on a wide range of NLP tasks. In the past few years, Knowledge Distillation(KD) has become a popular paradigm to compress a computationally expensive model to a resource-efficient lightweight model. However, most KD algorithms, especially in NLP, rely on the accessibility of the original training dataset, which may be unavailable due to privacy issues. To tackle this problem, we propose a novel two-stage data-free distillation method, named Adversarial self-Supervised Data-Free Distillation (AS-DFD), which is designed for compressing large-scale transformer-based models (e.g., BERT). To avoid text generation in discrete space, we introduce a Plug & Play Embedding Guessing method to craft pseudo embeddings from the teacher's hidden knowledge. Meanwhile, with a self-supervised module to quantify the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Adversarial Robustness in Machine Learning