Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization
Chen Liang, Simiao Zuo, Minshuo Chen, Haoming Jiang, Xiaodong Liu,, Pengcheng He, Tuo Zhao, Weizhu Chen

TL;DR
This paper investigates the phenomenon of 'super tickets' in pre-trained language models, showing that certain compressed subnetworks can outperform full models in generalization, with implications for model efficiency and multi-task learning.
Contribution
It introduces the concept of super tickets, revealing phase transition behavior in model compression and demonstrating their benefits for fine-tuning and multi-task learning.
Findings
Super tickets can outperform full models at certain compression ratios.
Phase transition in generalization performance depends on model size and data.
Super tickets improve fine-tuning accuracy on GLUE benchmark.
Abstract
The Lottery Ticket Hypothesis suggests that an over-parametrized network consists of ``lottery tickets'', and training a certain collection of them (i.e., a subnetwork) can match the performance of the full model. In this paper, we study such a collection of tickets, which is referred to as ``winning tickets'', in extremely over-parametrized models, e.g., pre-trained language models. We observe that at certain compression ratios, the generalization performance of the winning tickets can not only match but also exceed that of the full model. In particular, we observe a phase transition phenomenon: As the compression ratio increases, generalization performance of the winning tickets first improves then deteriorates after a certain threshold. We refer to the tickets on the threshold as ``super tickets''. We further show that the phase transition is task and model dependent -- as the model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
