Model Compression with Two-stage Multi-teacher Knowledge Distillation   for Web Question Answering System

Ze Yang; Linjun Shou; Ming Gong; Wutao Lin; Daxin Jiang

arXiv:1910.08381·cs.CL·October 21, 2019·6 cites

Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System

Ze Yang, Linjun Shou, Ming Gong, Wutao Lin, Daxin Jiang

PDF

Open Access

TL;DR

This paper introduces a two-stage multi-teacher knowledge distillation approach to compress large question answering models, significantly improving inference speed while maintaining high accuracy.

Contribution

The proposed TMKD method effectively reduces model size and inference time without sacrificing performance by combining pre-training and multi-teacher fine-tuning.

Findings

01

Significant speedup in model inference.

02

Outperforms baseline compression methods.

03

Achieves comparable accuracy to original models.

Abstract

Deep pre-training and fine-tuning models (such as BERT and OpenAI GPT) have demonstrated excellent results in question answering areas. However, due to the sheer amount of model parameters, the inference speed of these models is very slow. How to apply these complex models to real business scenarios becomes a challenging but practical problem. Previous model compression methods usually suffer from information loss during the model compression procedure, leading to inferior models compared with the original one. To tackle this challenge, we propose a Two-stage Multi-teacher Knowledge Distillation (TMKD for short) method for web Question Answering system. We first develop a general Q\&A distillation task for student model pre-training, and further fine-tune this pre-trained student model with multi-teacher knowledge distillation on downstream tasks (like Web Q\&A task, MNLI, SNLI, RTE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Knowledge Distillation · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam