Put Teacher in Student's Shoes: Cross-Distillation for Ultra-compact Model Compression Framework
Maolin Wang, Jun Chu, Sicong Xie, Xiaoling Zang, Yao Zhao, Wenliang Zhong, Xiangyu Zhao

TL;DR
This paper introduces EI-BERT, a novel ultra-compact NLP model framework utilizing cross-distillation, enabling deployment of highly efficient BERT-based models on resource-constrained edge devices with strong real-world performance.
Contribution
The paper presents a new cross-distillation method combined with pruning and quantization to create the smallest BERT-based model for NLP tasks, suitable for edge deployment.
Findings
Achieved a 1.91 MB BERT-based model for NLP tasks.
Successfully deployed on 8.4 million devices in Alipay.
Demonstrated significant real-world performance improvements.
Abstract
In the era of mobile computing, deploying efficient Natural Language Processing (NLP) models in resource-restricted edge settings presents significant challenges, particularly in environments requiring strict privacy compliance, real-time responsiveness, and diverse multi-tasking capabilities. These challenges create a fundamental need for ultra-compact models that maintain strong performance across various NLP tasks while adhering to stringent memory constraints. To this end, we introduce Edge ultra-lIte BERT framework (EI-BERT) with a novel cross-distillation method. EI-BERT efficiently compresses models through a comprehensive pipeline including hard token pruning, cross-distillation and parameter quantization. Specifically, the cross-distillation method uniquely positions the teacher model to understand the student model's perspective, ensuring efficient knowledge transfer through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
