SPAFIT: Stratified Progressive Adaptation Fine-tuning for Pre-trained   Large Language Models

Samir Arora; Liangliang Wang

arXiv:2405.00201·cs.CL·May 2, 2024

SPAFIT: Stratified Progressive Adaptation Fine-tuning for Pre-trained Large Language Models

Samir Arora, Liangliang Wang

PDF

Open Access

TL;DR

SPAFIT is a new parameter-efficient fine-tuning method for large language models that localizes linguistic knowledge to specific layers, outperforming existing PEFT methods on multiple tasks with fewer parameters.

Contribution

The paper introduces SPAFIT, a novel PEFT approach that leverages layer-specific linguistic knowledge for more efficient fine-tuning of large language models.

Findings

01

SPAFIT outperforms other PEFT methods on GLUE tasks.

02

SPAFIT fine-tunes fewer parameters than existing methods.

03

SPAFIT reduces computational and storage requirements.

Abstract

Full fine-tuning is a popular approach to adapt Transformer-based pre-trained large language models to a specific downstream task. However, the substantial requirements for computational power and storage have discouraged its widespread use. Moreover, increasing evidence of catastrophic forgetting and overparameterization in the Transformer architecture has motivated researchers to seek more efficient fine-tuning (PEFT) methods. Commonly known parameter-efficient fine-tuning methods like LoRA and BitFit are typically applied across all layers of the model. We propose a PEFT method, called Stratified Progressive Adaptation Fine-tuning (SPAFIT), based on the localization of different types of linguistic knowledge to specific layers of the model. Our experiments, conducted on nine tasks from the GLUE benchmark, show that our proposed SPAFIT method outperforms other PEFT methods while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Dropout · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Linear Layer · Dense Connections · Label Smoothing · Residual Connection