Structural Pruning of Pre-trained Language Models via Neural   Architecture Search

Aaron Klein; Jacek Golebiowski; Xingchen Ma; Valerio Perrone; Cedric; Archambeau

arXiv:2405.02267·cs.LG·August 27, 2024

Structural Pruning of Pre-trained Language Models via Neural Architecture Search

Aaron Klein, Jacek Golebiowski, Xingchen Ma, Valerio Perrone, Cedric, Archambeau

PDF

Open Access 1 Repo

TL;DR

This paper introduces a neural architecture search-based method for structurally pruning pre-trained language models, optimizing for efficiency and performance trade-offs, and employs multi-objective Pareto optimization for automated model compression.

Contribution

It applies neural architecture search to structural pruning of PLMs, utilizing two-stage weight-sharing NAS and multi-objective optimization for flexible, efficient model compression.

Findings

01

Effective identification of Pareto optimal sub-networks

02

Accelerated search using two-stage weight-sharing NAS

03

Improved trade-off between model size and accuracy

Abstract

Pre-trained language models (PLM), for example BERT or RoBERTa, mark the state-of-the-art for natural language understanding task when fine-tuned on labeled data. However, their large size poses challenges in deploying them for inference in real-world applications, due to significant GPU memory requirements and high inference latency. This paper explores neural architecture search (NAS) for structural pruning to find sub-parts of the fine-tuned network that optimally trade-off efficiency, for example in terms of model size or latency, and generalization performance. We also show how we can utilize more recently developed two-stage weight-sharing NAS approaches in this setting to accelerate the search process. Unlike traditional pruning methods with fixed thresholds, we propose to adopt a multi-objective approach that identifies the Pareto optimal set of sub-networks, allowing for a more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

whittle-org/plm_pruning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Sparse Evolutionary Training · Dropout · Weight Decay · Attention Dropout · Residual Connection · Softmax · WordPiece · RoBERTa