Aligned Weight Regularizers for Pruning Pretrained Neural Networks
James O' Neill, Sourav Dutta, Haytham Assem

TL;DR
This paper investigates how pruning affects zero-shot performance in cross-lingual models and introduces weight regularizers to preserve alignment between pruned and unpruned networks, improving zero-shot and non-zero-shot results.
Contribution
It presents the first study on cross-lingual language model pruning and proposes alignment-maximizing regularizers to mitigate pruning-induced performance degradation.
Findings
Pruning causes greater performance loss in zero-shot settings compared to supervised learning.
Regularizers improve alignment and performance in pruned cross-lingual models.
Pruning impacts different languages to varying degrees, affecting representational quality.
Abstract
While various avenues of research have been explored for iterative pruning, little is known what effect pruning has on zero-shot test performance and its potential implications on the choice of pruning criteria. This pruning setup is particularly important for cross-lingual models that implicitly learn alignment between language representations during pretraining, which if distorted via pruning, not only leads to poorer performance on language data used for retraining but also on zero-shot languages that are evaluated. In this work, we show that there is a clear performance discrepancy in magnitude-based pruning when comparing standard supervised learning to the zero-shot setting. From this finding, we propose two weight regularizers that aim to maximize the alignment between units of pruned and unpruned networks to mitigate alignment distortion in pruned cross-lingual models and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsPruning · Network On Network
