Super Tickets in Pre-Trained Language Models: From Model Compression to   Improving Generalization

Chen Liang; Simiao Zuo; Minshuo Chen; Haoming Jiang; Xiaodong Liu,; Pengcheng He; Tuo Zhao; Weizhu Chen

arXiv:2105.12002·cs.LG·June 9, 2021·1 cites

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization

Chen Liang, Simiao Zuo, Minshuo Chen, Haoming Jiang, Xiaodong Liu,, Pengcheng He, Tuo Zhao, Weizhu Chen

PDF

Open Access 1 Repo

TL;DR

This paper investigates the phenomenon of 'super tickets' in pre-trained language models, showing that certain compressed subnetworks can outperform full models in generalization, with implications for model efficiency and multi-task learning.

Contribution

It introduces the concept of super tickets, revealing phase transition behavior in model compression and demonstrating their benefits for fine-tuning and multi-task learning.

Findings

01

Super tickets can outperform full models at certain compression ratios.

02

Phase transition in generalization performance depends on model size and data.

03

Super tickets improve fine-tuning accuracy on GLUE benchmark.

Abstract

The Lottery Ticket Hypothesis suggests that an over-parametrized network consists of ``lottery tickets'', and training a certain collection of them (i.e., a subnetwork) can match the performance of the full model. In this paper, we study such a collection of tickets, which is referred to as ``winning tickets'', in extremely over-parametrized models, e.g., pre-trained language models. We observe that at certain compression ratios, the generalization performance of the winning tickets can not only match but also exceed that of the full model. In particular, we observe a phase transition phenomenon: As the compression ratio increases, generalization performance of the winning tickets first improves then deteriorates after a certain threshold. We refer to the tickets on the threshold as ``super tickets''. We further show that the phase transition is task and model dependent -- as the model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cliang1453/super-structured-lottery-tickets
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications