Towards Zero-Shot Knowledge Distillation for Natural Language Processing
Ahmad Rashid, Vasileios Lioutas, Abbas Ghaddar, Mehdi, Rezagholizadeh

TL;DR
This paper introduces a novel zero-shot knowledge distillation method for NLP that enables a student model to learn from a teacher without access to task-specific data, using out-of-domain data and adversarial training.
Contribution
It is the first to propose zero-shot knowledge distillation for NLP, combining out-of-domain data and adversarial training to transfer knowledge without task-specific data.
Findings
Achieves 75-92% of teacher accuracy on GLUE tasks
Compresses models 30 times while maintaining high performance
Demonstrates effectiveness across six NLP tasks
Abstract
Knowledge Distillation (KD) is a common knowledge transfer algorithm used for model compression across a variety of deep learning based natural language processing (NLP) solutions. In its regular manifestations, KD requires access to the teacher's training data for knowledge transfer to the student network. However, privacy concerns, data regulations and proprietary reasons may prevent access to such data. We present, to the best of our knowledge, the first work on Zero-Shot Knowledge Distillation for NLP, where the student learns from the much larger teacher without any task specific data. Our solution combines out of domain data and adversarial training to learn the teacher's output distribution. We investigate six tasks from the GLUE benchmark and demonstrate that we can achieve between 75% and 92% of the teacher's classification score (accuracy or F1) while compressing the model 30…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Multimodal Machine Learning Applications
MethodsKnowledge Distillation
