KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models

Fan Wang; Juyong Jiang; Chansung Park; Sunghun Kim; Jing Tang

arXiv:2412.06071·cs.CL·March 3, 2025

KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models

Fan Wang, Juyong Jiang, Chansung Park, Sunghun Kim, Jing Tang

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

KaSA introduces a knowledge-aware singular-value adaptation method for large language models, improving task-specific performance by dynamically activating relevant knowledge during fine-tuning, outperforming existing PEFT methods across multiple benchmarks.

Contribution

We propose KaSA, a novel PEFT approach using knowledge-aware SVD to selectively activate relevant knowledge, enhancing LLM adaptation efficiency and effectiveness.

Findings

01

KaSA outperforms FFT and 14 PEFT baselines on 16 benchmarks.

02

KaSA demonstrates consistent improvements across NLU, NLG, and reasoning tasks.

03

The method effectively activates relevant knowledge, improving model performance.

Abstract

The increasing sizes of large language models (LLMs) result in significant computational overhead and memory usage when adapting these models to specific tasks or domains. Various parameter-efficient fine-tuning (PEFT) methods have been devised to mitigate these challenges by training a small set of parameters for the task-specific updates of the model weights. Among PEFT methods, LoRA stands out for its simplicity and efficiency, inspiring the development of a series of variants. However, LoRA and its successors disregard the knowledge that is noisy or irrelevant to the targeted task, detrimentally impacting model performance and leading to suboptimality. To address this limitation, we introduce Knowledge-aware Singular-value Adaptation (KaSA), a PEFT method that leverages singular value decomposition (SVD) with knowledge-aware singular values to dynamically activate knowledge based on…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

* The evaluated tasks are comprehensive, including NLU, NLG and instruction-following. * The ablation study provides evidence to demonstrate the effectiveness of all four design choices on selected NLP tasks.

Weaknesses

* The improvement of the proposed method over the baselines is marginal. The variances and the results of the significance tests may help justify the improvement. * It is unclear why GPT-2 is used for NLG tasks, while Gemma, Mistral and LlaMA3 are used for instruction following tasks. Hence, the experimental design is not systematic.

Reviewer 02Rating 8Confidence 5

Strengths

KaSA seems like a promising technique, and is competitive with LoRA and SVD style techniques like PiSSA and MiLoRA. The authors do extensive experiments and benchmarking, and the paper is very well written. Minor notes: * The paper has extensive citations that are very helpful to the reader * Figure 2 with “ablations” is very informative - the trends are quite clear * The “no robots” dataset is a good choice for seeding synthetic data, since it was generated by human annotators.

Weaknesses

The experimental results are overall quite strong. The main weakness of the manuscript is that it lacks a straightforward explanation of the method. I’m not sure I fully understand the KaSA method here based on the description, and the diagram is slightly confusing as well. It would be very helpful to have a pseudocode block/section. 1. My current understanding is that the method requires (1) LoRA finetuning followed by (2) SVD truncation followed by (3) “knowledge aware singular-value adaptati

Reviewer 03Rating 8Confidence 4

Strengths

KaSA combines SVD truncation with a knowledge-aware adaptation process, enabling it to achieve impressive performance on various NLP tasks. KaSA often outperforms other PEFT methods on benchmarks, showing its effectiveness in balancing performance and efficiency. The experiments are thorough, covering natural language understanding, generation, and instruction-following. The ablation studies break down how each component (like the knowledge-aware adaptation) contributes to the overall performanc

Weaknesses

The method adds some complexity, especially with the dynamic singular value adaptation, which might make it harder to implement compared to simpler PEFT methods. A potential limitation is the risk that SVD truncation might discard useful knowledge, especially if it’s not fully adapted to a task. It would also help to see this method applied to a wider range of LLM architectures to confirm its generalizability. While KaSA does well on the tested models, applying it to newer transformer architectu

Code & Models

Repositories

juyongjiang/kasa
pytorchOfficial

Videos

KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsSparse Evolutionary Training