RoCoFT: Efficient Finetuning of Large Language Models with Row-Column Updates
Md Kowsher, Tara Esmaeilbeig, Chun-Nam Yu, Chen Chen, Mojtaba Soltanalian, Niloofar Yousefi

TL;DR
RoCoFT introduces a parameter-efficient fine-tuning approach for large language models that updates only specific rows and columns of weight matrices, achieving comparable or better accuracy with less memory and computation.
Contribution
The paper presents RoCoFT, a novel fine-tuning method that selectively updates row and column parameters, backed by neural tangent kernel analysis and extensive empirical validation.
Findings
Achieves comparable or better accuracy than state-of-the-art PEFT methods.
Reduces memory and computational requirements during fine-tuning.
Kernel analysis shows close approximation to full-parameter models.
Abstract
We propose RoCoFT, a parameter-efficient fine-tuning method for large-scale language models (LMs) based on updating only a few rows and columns of the weight matrices in transformers. Through extensive experiments with medium-size LMs like BERT and RoBERTa, and larger LMs like Bloom-7B, Llama2-7B, and Llama2-13B, we show that our method gives comparable or better accuracies than state-of-art PEFT methods while also being more memory and computation-efficient. We also study the reason behind the effectiveness of our method with tools from neural tangent kernel theory. We empirically demonstrate that our kernel, constructed using a restricted set of row and column parameters, are numerically close to the full-parameter kernel and gives comparable classification performance. Ablation studies are conducted to investigate the impact of different algorithmic choices, including the selection…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
- The presentation is clear, and the paper is easy to follow, with only a few minor typos. - The proposed method, RoCoFT, is straightforward and demonstrates strong empirical performance. - The results are reported across multiple tasks and base models, evaluated using various metrics, including memory usage, computation time, and accuracy. This is a good plus to the paper.
- **Lack of Related Work Discussion**: One weakness of this paper is the limited scope of its related work discussion, focusing primarily on low-rank methods (e.g., LoRA). However, RoCoFT has a closer methodological resemblance to pruning and sparse fine-tuning methods, which are underrepresented in this review. In the parameter-efficient fine-tuning (PEFT) field, methods generally fall into either low-rank or subset of trainable parameter categories, so a more comprehensive comparison should in
The authors study an important problem in LLMs. The method is relatively efficient and lightweight. The evaluation covers multiple transfer tasks and several base models. They provide an NTK based empirical evaluation that aims to explain the observed phenomenon.
The paper is not well written, multiple parts are not clear, and there are many typos. In essence, the method presented in this paper was already presented in another paper [1]. In fact, in [1] the authors wrote: “ We randomly sampled the same amount of parameters as in BitFit from the entire model, and fine-tuned only them (“rand uniform” line in Table 3). The results are substantially worse across all tasks; similar patterns are observed when the random parameters are sampled as complete row
- The method is very simple but shows prominent results for some datasets - The method was evaluated on large and diverse number of datasets - Applying NTK regression to get explanation for why the method works - looks interesting
- Limited novelty of the proposed method: the authors propose to update a few columns/rows in the base model and exploit the existing NTK regression method to explain it. - I don’t understand how the results in Table 5 are consistent with Table 1 so we can explain why the method works with NTK regression. In Table 5 the proposed method performs worse than FT while in Table 1 it is not the case. - In-place updates disable the behavior of the model as an adaptor. This is a trade-off that should
1. The method is simple, straightforward, yet effective. The presentation is clear and easy to follow. 2. The performance comparison with baselines is extensive. Besides, the learnable parameters in RoCoFT are much less than existing methods, which is very useful. 3. As shown in ablation studies, the strategy of choosing rows and columns is robust and does not need much tuning.
1. The NTK analysis in Section 5 is not complete. The results in Tables 5 and 6 only include comparisons between RoCoFT, FT, and the pre-trained weights. However, if other methods, such as LoRA, also have a kernel that is empirically close to the full-parameter kernel, it becomes unclear why RoCoFT can achieve performance improvements over them. Similar experiments on other baselines should also be included. 2. Further explanation should be provided on why the few-shot learning performance is us
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Dense Connections · WordPiece · Residual Connection · Linear Warmup With Linear Decay · Dropout
