Make Pre-trained Model Reversible: From Parameter to Memory Efficient   Fine-Tuning

Baohao Liao; Shaomu Tan; Christof Monz

arXiv:2306.00477·cs.CL·October 20, 2023·2 cites

Make Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning

Baohao Liao, Shaomu Tan, Christof Monz

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MEFT, a memory-efficient fine-tuning method that makes pre-trained language models reversible by inserting adapters, significantly reducing memory usage while maintaining performance.

Contribution

The paper proposes a novel approach to make PLMs reversible during fine-tuning without additional pre-training, greatly reducing activation memory.

Findings

01

Reduces activation memory by up to 84% compared to full fine-tuning

02

Maintains comparable performance on GLUE and QA tasks

03

Applicable to various backbones like BERT, RoBERTa, BART, and OPT

Abstract

Parameter-efficient fine-tuning (PEFT) of pre-trained language models (PLMs) has emerged as a highly successful approach, with training only a small number of parameters without sacrificing performance and becoming the de-facto learning paradigm with the increasing size of PLMs. However, existing PEFT methods are not memory-efficient, because they still require caching most of the intermediate activations for the gradient calculation, akin to fine-tuning. One effective way to reduce the activation memory is to apply a reversible model, so the intermediate activations are not necessary to be cached and can be recomputed. Nevertheless, modifying a PLM to its reversible variant is not straightforward, since the reversible model has a distinct architecture from the currently released PLMs. In this paper, we first investigate what is a key factor for the success of existing PEFT methods, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

baohaoliao/mefts
pytorchOfficial

Videos

Make Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · WordPiece · Linear Warmup With Linear Decay · Attention Dropout · Dropout · Adam · Byte Pair Encoding