Layer-wise Importance Matters: Less Memory for Better Performance in   Parameter-efficient Fine-tuning of Large Language Models

Kai Yao; Penglei Gao; Lichun Li; Yuan Zhao; Xiaofeng Wang; Wei Wang,; and Jianke Zhu

arXiv:2410.11772·cs.CL·November 6, 2024

Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models

Kai Yao, Penglei Gao, Lichun Li, Yuan Zhao, Xiaofeng Wang, Wei Wang,, and Jianke Zhu

PDF

Open Access 1 Repo

TL;DR

This paper introduces Importance-aware Sparse Tuning (IST), a novel method that improves parameter-efficient fine-tuning of large language models by selecting and updating only the most important layers, reducing memory use and enhancing performance.

Contribution

The paper proposes a new importance-aware sparse tuning approach that dynamically selects and updates key layers, outperforming uniform fine-tuning strategies in PEFT for LLMs.

Findings

01

IST reduces memory demands during fine-tuning.

02

IST achieves superior performance over uniform strategies.

03

Theoretical proof of convergence supports the method's reliability.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) methods have gained significant popularity for adapting pre-trained Large Language Models (LLMs) to downstream tasks, primarily due to their potential to significantly reduce memory and computational overheads. However, a common limitation in most PEFT approaches is their application of a uniform architectural design across all layers. This uniformity involves identical trainable modules and ignores the varying importance of each layer, leading to sub-optimal fine-tuning results. To overcome the above limitation and obtain better performance, we develop a novel approach, Importance-aware Sparse Tuning (IST), to fully utilize the inherent sparsity and select the most important subset of full layers with effective layer-wise importance scoring. The proposed IST is a versatile and plug-and-play technique compatible with various PEFT methods that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kaiseem/ist
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling