RankAdaptor: Hierarchical Rank Allocation for Efficient Fine-Tuning   Pruned LLMs via Performance Model

Changhai Zhou; Shijie Han; Lining Yang; Yuhua Zhou; Xu Cheng; Yibin; Wang; Hongguang Li

arXiv:2406.15734·cs.CL·December 17, 2024

RankAdaptor: Hierarchical Rank Allocation for Efficient Fine-Tuning Pruned LLMs via Performance Model

Changhai Zhou, Shijie Han, Lining Yang, Yuhua Zhou, Xu Cheng, Yibin, Wang, Hongguang Li

PDF

Open Access

TL;DR

RankAdaptor introduces a hierarchical rank allocation method that improves fine-tuning of pruned large language models by customizing layer-specific recovery, leading to significant performance gains over existing methods.

Contribution

The paper proposes a novel hierarchical rank allocation approach with a performance model for efficient fine-tuning of pruned LLMs, addressing limitations of fixed configurations in current methods.

Findings

01

Outperforms state-of-the-art methods across benchmarks

02

Achieves 0.7% to 5.5% performance improvements

03

Effective in various pruning settings and architectures

Abstract

The efficient compression of large language models (LLMs) has become increasingly popular. However, recovering the performance of compressed LLMs remains a major challenge. The current practice in LLM compression entails the implementation of structural pruning, complemented by a recovery phase that leverages the Low-Rank Adaptation (LoRA) algorithm. Structural pruning's uneven modification of model architecture, coupled with standard LoRA's fixed configuration allocation across layers in an online pipeline, leads to suboptimal performance in various downstream tasks for pruned models. To address this challenge, we introduce RankAdaptor, a hierarchical rank allocation method that enables efficient fine-tuning of pruned LLMs according to layerwise specific recovery requirements. We employ a performance model that conducts offline meta-learning and online incremental learning to explore…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSubtitles and Audiovisual Media

MethodsPruning