ROSITA: Refined BERT cOmpreSsion with InTegrAted techniques
Yuanxin Liu, Zheng Lin, Fengcheng Yuan

TL;DR
ROSITA is a novel BERT compression method that combines multiple techniques to significantly reduce model size while maintaining high performance on NLP tasks, outperforming previous approaches.
Contribution
The paper introduces an integrated compression framework combining pruning, low-rank factorization, and knowledge distillation for BERT, optimizing design choices for best results.
Findings
ROSITA achieves 7.5x smaller size than BERT.
Maintains 98.5% performance on GLUE tasks.
Outperforms previous BERT compression methods.
Abstract
Pre-trained language models of the BERT family have defined the state-of-the-arts in a wide range of NLP tasks. However, the performance of BERT-based models is mainly driven by the enormous amount of parameters, which hinders their application to resource-limited scenarios. Faced with this problem, recent studies have been attempting to compress BERT into a small-scale model. However, most previous work primarily focuses on a single kind of compression technique, and few attention has been paid to the combination of different methods. When BERT is compressed with integrated techniques, a critical question is how to design the entire compression framework to obtain the optimal performance. In response to this question, we integrate three kinds of compression methods (weight pruning, low-rank factorization and knowledge distillation (KD)) and explore a range of designs concerning model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsPruning · Linear Layer · Knowledge Distillation · Residual Connection · Layer Normalization · WordPiece · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Weight Decay · Dropout
