Dynamic Low-Rank Sparse Adaptation for Large Language Models

Weizhong Huang; Yuxin Zhang; Xiawu Zheng; Yang Liu; Jing Lin; Yiwu; Yao; Rongrong Ji

arXiv:2502.14816·cs.LG·February 21, 2025

Dynamic Low-Rank Sparse Adaptation for Large Language Models

Weizhong Huang, Yuxin Zhang, Xiawu Zheng, Yang Liu, Jing Lin, Yiwu, Yao, Rongrong Ji

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces LoSA, a dynamic low-rank sparse adaptation method that improves sparse LLM performance during fine-tuning without increasing inference latency, by adaptively sparsifying and adjusting the LoRA modules.

Contribution

LoSA seamlessly integrates low-rank adaptation into sparse LLMs, dynamically determining layer importance and rank adjustments to enhance performance efficiently.

Findings

01

Reduces perplexity of sparse LLaMA-2-7B by 68.73

02

Increases zero-shot accuracy by 16.32%

03

Achieves 2.60× speedup on CPU and 2.23× on GPU

Abstract

Despite the efficacy of network sparsity in alleviating the deployment strain of Large Language Models (LLMs), it endures significant performance degradation. Applying Low-Rank Adaptation (LoRA) to fine-tune the sparse LLMs offers an intuitive approach to counter this predicament, while it holds shortcomings include: 1) The inability to integrate LoRA weights into sparse LLMs post-training, and 2) Insufficient performance recovery at high sparsity ratios. In this paper, we introduce dynamic Low-rank Sparse Adaptation (LoSA), a novel method that seamlessly integrates low-rank adaptation into LLM sparsity within a unified framework, thereby enhancing the performance of sparse LLMs without increasing the inference latency. In particular, LoSA dynamically sparsifies the LoRA outcomes based on the corresponding sparse weights during fine-tuning, thus guaranteeing that the LoRA module can be…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 8Confidence 3

Strengths

* Addresses a relevant problem: Performance degradation in sparse LLMs is a known issue, and LoSA offers a practical solution. * Novelty: Integrating sparsification into the low-rank adaptation process and dynamically adjusting sparsity/rank based on RMI and reconstruction errors are novel ideas. * Strong empirical results: The experimental results show consistent improvements across various LLMs and sparsity levels. * Inference efficiency: LoSA preserves the inference speed advantages of spa

Weaknesses

na

Reviewer 02Rating 6Confidence 4

Strengths

1. reasonable motivation, studying the sparsification of LLM while applying LoRA. 2. clear problem formulation and related work introduction at each subproblem. 3. detailed and summarized pseudocode for connecting each step and explaining the overall algorithm. 4. strong and promising experimental results on LLMs regarding both model performance and speedup.

Weaknesses

1. the algorithm consists of many heuristics and is lack of step by step derivation, e.g., Eq. 7 and Eq. 9 2. some experiment details are missing and unclear.

Reviewer 03Rating 6Confidence 4

Strengths

1. LoSA introduces a combined dynamic sparsity and rank adjustment mechanism for fine-tuning sparse LLMs. Using RMI for layer-wise sparsity rate determination and reconstruction errors for rank allocation seems a reasonable approach for preserving model performance under sparse conditions. Moreover, trying to match the sparsity pattern of the adaptation path BA and the pre-trained weight W is novel. 2. Comprehensive Empirical Evaluation: The paper’s experiments cover multiple architectures

Weaknesses

1. The paper lacks comparisons with adaptive LoRA methods like AdaLoRA[1] and SoRA[2], which are critical for evaluating LoSA’s performance among recent dynamic rank approaches. Without these comparisons, LoSA’s relative advantage remains unclear. 2. The optimization setup in Eq. 5 (Section 2.2) assigns higher sparsity rates to layers with higher importance, which contradicts standard practices that seek to preserve the most important layers. This questionable logic may weaken the model’s repr

Code & Models

Repositories

wzhuang-xmu/losa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis