When BERT Plays the Lottery, All Tickets Are Winning
Sai Prasanna, Anna Rogers, Anna Rumshisky

TL;DR
This paper investigates the lottery ticket hypothesis in BERT, demonstrating that small, well-chosen subnetworks can match full model performance, and that most pre-trained weights are potentially useful, regardless of linguistic knowledge.
Contribution
It shows that fine-tuned BERT contains subnetworks with performance comparable to the full model, and that most pre-trained weights are useful, challenging assumptions about the importance of specific patterns.
Findings
Subnetworks can match full BERT performance.
Most pre-trained weights are potentially useful.
Poorly performing subnetworks are still highly trainable.
Abstract
Large Transformer-based models were shown to be reducible to a smaller number of self-attention heads and layers. We consider this phenomenon from the perspective of the lottery ticket hypothesis, using both structured and magnitude pruning. For fine-tuned BERT, we show that (a) it is possible to find subnetworks achieving performance that is comparable with that of the full model, and (b) similarly-sized subnetworks sampled from the rest of the model perform worse. Strikingly, with structured pruning even the worst possible subnetworks remain highly trainable, indicating that most pre-trained BERT weights are potentially useful. We also study the "good" subnetworks to see if their success can be attributed to superior linguistic knowledge, but find them unstable, and not explained by meaningful self-attention patterns.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
When BERT Plays the Lottery, All Tickets Are Winning (Paper Explained)· youtube
Taxonomy
TopicsTopic Modeling · Stock Market Forecasting Methods · Sports Analytics and Performance
MethodsPruning · Linear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece
