Adversarial Self-Supervised Data-Free Distillation for Text Classification
Xinyin Ma, Yongliang Shen, Gongfan Fang, Chen Chen, Chenghao Jia,, Weiming Lu

TL;DR
This paper introduces a novel data-free distillation method for NLP, enabling the compression of large transformer models without access to original data by using adversarial self-supervised pseudo embeddings.
Contribution
It proposes the first data-free distillation framework for NLP, utilizing adversarial self-supervision and pseudo embeddings to effectively compress models like BERT.
Findings
Achieves competitive performance on text classification datasets.
Demonstrates effectiveness without requiring original training data.
Introduces a Plug & Play Embedding Guessing method for pseudo data generation.
Abstract
Large pre-trained transformer-based language models have achieved impressive results on a wide range of NLP tasks. In the past few years, Knowledge Distillation(KD) has become a popular paradigm to compress a computationally expensive model to a resource-efficient lightweight model. However, most KD algorithms, especially in NLP, rely on the accessibility of the original training dataset, which may be unavailable due to privacy issues. To tackle this problem, we propose a novel two-stage data-free distillation method, named Adversarial self-Supervised Data-Free Distillation (AS-DFD), which is designed for compressing large-scale transformer-based models (e.g., BERT). To avoid text generation in discrete space, we introduce a Plug & Play Embedding Guessing method to craft pseudo embeddings from the teacher's hidden knowledge. Meanwhile, with a self-supervised module to quantify the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Adversarial Robustness in Machine Learning
