Information Guided Regularization for Fine-tuning Language Models

Mandar Sharma; Nikhil Muralidhar; Shengzhe Xu; Raquib Bin Yousuf,; Naren Ramakrishnan

arXiv:2406.14005·cs.CL·June 24, 2024

Information Guided Regularization for Fine-tuning Language Models

Mandar Sharma, Nikhil Muralidhar, Shengzhe Xu, Raquib Bin Yousuf,, Naren Ramakrishnan

PDF

Open Access 1 Repo

TL;DR

This paper introduces an information-theoretic approach to regularization in fine-tuning language models, proposing guided dropout that improves generalization without extra computational cost.

Contribution

It presents a novel, task-agnostic dropout method based on insights into the loss landscape, enhancing transfer learning in language models.

Findings

01

Guided dropout improves downstream performance across tasks.

02

The method is effective even with limited data.

03

No additional computational overhead is introduced.

Abstract

The pretraining-fine-tuning paradigm has been the de facto strategy for transfer learning in modern language modeling. With the understanding that task adaptation in LMs is often a function of parameters shared across tasks, we argue that a more surgical approach to regularization needs to exist for smoother transfer learning. Towards this end, we investigate how the pretraining loss landscape is affected by these task-sensitive parameters through an information-theoretic lens. We then leverage the findings from our investigations to devise a novel approach to dropout for improved model regularization and better downstream generalization. This approach, named guided dropout, is both task & architecture agnostic and adds no computational overhead to the fine-tuning process. Through empirical evaluations, we showcase that our approach to regularization yields consistently better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mandar-sharma/guided-dropout
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsDropout