Sensi-BERT: Towards Sensitivity Driven Fine-Tuning for   Parameter-Efficient BERT

Souvik Kundu; Sharath Nittur Sridhar; Maciej Szankin; Sairam; Sundaresan

arXiv:2307.11764·cs.CL·September 1, 2023

Sensi-BERT: Towards Sensitivity Driven Fine-Tuning for Parameter-Efficient BERT

Souvik Kundu, Sharath Nittur Sridhar, Maciej Szankin, Sairam, Sundaresan

PDF

Open Access

TL;DR

Sensi-BERT introduces a sensitivity-driven fine-tuning method that efficiently reduces BERT model size for resource-limited devices while maintaining or improving task performance.

Contribution

It proposes a novel sensitivity analysis approach to selectively trim BERT parameters during fine-tuning, enhancing parameter efficiency without heavy additional compute.

Findings

01

Outperforms existing methods on multiple NLP tasks

02

Achieves higher accuracy with fewer parameters

03

Maintains performance with significant model size reduction

Abstract

Large pre-trained language models have recently gained significant traction due to their improved performance on various down-stream tasks like text classification and question answering, requiring only few epochs of fine-tuning. However, their large model sizes often prohibit their applications on resource-constrained edge devices. Existing solutions of yielding parameter-efficient BERT models largely rely on compute-exhaustive training and fine-tuning. Moreover, they often rely on additional compute heavy models to mitigate the performance gap. In this paper, we present Sensi-BERT, a sensitivity driven efficient fine-tuning of BERT models that can take an off-the-shelf pre-trained BERT model and yield highly parameter-efficient models for downstream tasks. In particular, we perform sensitivity analysis to rank each individual parameter tensor, that then is used to trim them…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Linear Warmup With Linear Decay · Linear Layer · Softmax · Dense Connections · Weight Decay · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece