Aligned Weight Regularizers for Pruning Pretrained Neural Networks

James O' Neill; Sourav Dutta; Haytham Assem

arXiv:2204.01385·cs.CL·April 6, 2022

Aligned Weight Regularizers for Pruning Pretrained Neural Networks

James O' Neill, Sourav Dutta, Haytham Assem

PDF

Open Access

TL;DR

This paper investigates how pruning affects zero-shot performance in cross-lingual models and introduces weight regularizers to preserve alignment between pruned and unpruned networks, improving zero-shot and non-zero-shot results.

Contribution

It presents the first study on cross-lingual language model pruning and proposes alignment-maximizing regularizers to mitigate pruning-induced performance degradation.

Findings

01

Pruning causes greater performance loss in zero-shot settings compared to supervised learning.

02

Regularizers improve alignment and performance in pruned cross-lingual models.

03

Pruning impacts different languages to varying degrees, affecting representational quality.

Abstract

While various avenues of research have been explored for iterative pruning, little is known what effect pruning has on zero-shot test performance and its potential implications on the choice of pruning criteria. This pruning setup is particularly important for cross-lingual models that implicitly learn alignment between language representations during pretraining, which if distorted via pruning, not only leads to poorer performance on language data used for retraining but also on zero-shot languages that are evaluated. In this work, we show that there is a clear performance discrepancy in magnitude-based pruning when comparing standard supervised learning to the zero-shot setting. From this finding, we propose two weight regularizers that aim to maximize the alignment between units of pruned and unpruned networks to mitigate alignment distortion in pruned cross-lingual models and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsPruning · Network On Network