Weight Squeezing: Reparameterization for Knowledge Transfer and Model Compression
Artem Chumachenko, Daniil Gavrilov, Nikita Balagansky, Pavel, Kalaidin

TL;DR
This paper introduces Weight Squeezing, a novel method for simultaneous knowledge transfer and model compression that outperforms existing techniques in speed and accuracy, demonstrated on BERT-based models for text classification.
Contribution
The paper proposes Weight Squeezing, a new reparameterization approach for efficient knowledge transfer and model compression, including a variant called Gated Weight Squeezing that enhances fine-tuning.
Findings
Weight Squeezing outperforms other methods on GLUE benchmark.
Gated Weight Squeezing improves fine-tuning results over standard methods.
The approach is faster and easier to implement than existing techniques.
Abstract
In this work, we present a novel approach for simultaneous knowledge transfer and model compression called Weight Squeezing. With this method, we perform knowledge transfer from a teacher model by learning the mapping from its weights to smaller student model weights. We applied Weight Squeezing to a pre-trained text classification model based on BERT-Medium model and compared our method to various other knowledge transfer and model compression methods on GLUE multitask benchmark. We observed that our approach produces better results while being significantly faster than other methods for training student models. We also proposed a variant of Weight Squeezing called Gated Weight Squeezing, for which we combined fine-tuning of BERT-Medium model and learning mapping from BERT-Base weights. We showed that fine-tuning with Gated Weight Squeezing outperforms plain fine-tuning of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
