RoCoFT: Efficient Finetuning of Large Language Models with Row-Column Updates

Md Kowsher; Tara Esmaeilbeig; Chun-Nam Yu; Chen Chen; Mojtaba Soltanalian; Niloofar Yousefi

arXiv:2410.10075·cs.CL·June 3, 2025

RoCoFT: Efficient Finetuning of Large Language Models with Row-Column Updates

Md Kowsher, Tara Esmaeilbeig, Chun-Nam Yu, Chen Chen, Mojtaba Soltanalian, Niloofar Yousefi

PDF

Open Access 1 Repo 1 Video 4 Reviews

TL;DR

RoCoFT introduces a parameter-efficient fine-tuning approach for large language models that updates only specific rows and columns of weight matrices, achieving comparable or better accuracy with less memory and computation.

Contribution

The paper presents RoCoFT, a novel fine-tuning method that selectively updates row and column parameters, backed by neural tangent kernel analysis and extensive empirical validation.

Findings

01

Achieves comparable or better accuracy than state-of-the-art PEFT methods.

02

Reduces memory and computational requirements during fine-tuning.

03

Kernel analysis shows close approximation to full-parameter models.

Abstract

We propose RoCoFT, a parameter-efficient fine-tuning method for large-scale language models (LMs) based on updating only a few rows and columns of the weight matrices in transformers. Through extensive experiments with medium-size LMs like BERT and RoBERTa, and larger LMs like Bloom-7B, Llama2-7B, and Llama2-13B, we show that our method gives comparable or better accuracies than state-of-art PEFT methods while also being more memory and computation-efficient. We also study the reason behind the effectiveness of our method with tools from neural tangent kernel theory. We empirically demonstrate that our kernel, constructed using a restricted set of row and column parameters, are numerically close to the full-parameter kernel and gives comparable classification performance. Ablation studies are conducted to investigate the impact of different algorithmic choices, including the selection…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 5Confidence 5

Strengths

- The presentation is clear, and the paper is easy to follow, with only a few minor typos. - The proposed method, RoCoFT, is straightforward and demonstrates strong empirical performance. - The results are reported across multiple tasks and base models, evaluated using various metrics, including memory usage, computation time, and accuracy. This is a good plus to the paper.

Weaknesses

- **Lack of Related Work Discussion**: One weakness of this paper is the limited scope of its related work discussion, focusing primarily on low-rank methods (e.g., LoRA). However, RoCoFT has a closer methodological resemblance to pruning and sparse fine-tuning methods, which are underrepresented in this review. In the parameter-efficient fine-tuning (PEFT) field, methods generally fall into either low-rank or subset of trainable parameter categories, so a more comprehensive comparison should in

Reviewer 02Rating 3Confidence 4

Strengths

The authors study an important problem in LLMs. The method is relatively efficient and lightweight. The evaluation covers multiple transfer tasks and several base models. They provide an NTK based empirical evaluation that aims to explain the observed phenomenon.

Weaknesses

The paper is not well written, multiple parts are not clear, and there are many typos. In essence, the method presented in this paper was already presented in another paper [1]. In fact, in [1] the authors wrote: “ We randomly sampled the same amount of parameters as in BitFit from the entire model, and fine-tuned only them (“rand uniform” line in Table 3). The results are substantially worse across all tasks; similar patterns are observed when the random parameters are sampled as complete row

Reviewer 03Rating 6Confidence 4

Strengths

- The method is very simple but shows prominent results for some datasets - The method was evaluated on large and diverse number of datasets - Applying NTK regression to get explanation for why the method works - looks interesting

Weaknesses

- Limited novelty of the proposed method: the authors propose to update a few columns/rows in the base model and exploit the existing NTK regression method to explain it. - I don’t understand how the results in Table 5 are consistent with Table 1 so we can explain why the method works with NTK regression. In Table 5 the proposed method performs worse than FT while in Table 1 it is not the case. - In-place updates disable the behavior of the model as an adaptor. This is a trade-off that should

Reviewer 04Rating 6Confidence 3

Strengths

1. The method is simple, straightforward, yet effective. The presentation is clear and easy to follow. 2. The performance comparison with baselines is extensive. Besides, the learnable parameters in RoCoFT are much less than existing methods, which is very useful. 3. As shown in ablation studies, the strategy of choosing rows and columns is robust and does not need much tuning.

Weaknesses

1. The NTK analysis in Section 5 is not complete. The results in Tables 5 and 6 only include comparisons between RoCoFT, FT, and the pre-trained weights. However, if other methods, such as LoRA, also have a kernel that is empirically close to the full-parameter kernel, it becomes unclear why RoCoFT can achieve performance improvements over them. Similar experiments on other baselines should also be included. 2. Further explanation should be provided on why the few-shot learning performance is us

Code & Models

Repositories

Kowsher/RoCoFT
pytorchOfficial

Videos

RoCoFT: Efficient Finetuning of Large Language Models with Row-Column Updates· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Dense Connections · WordPiece · Residual Connection · Linear Warmup With Linear Decay · Dropout