LoRTA: Low Rank Tensor Adaptation of Large Language Models
Ignacio Hounie, Charilaos Kanatsoulis, Arnuv Tandon, Alejandro Ribeiro

TL;DR
LoRTA introduces a higher-order tensor decomposition method for efficient fine-tuning of large language models, reducing parameters while maintaining performance across various benchmarks.
Contribution
This work proposes a novel CP tensor decomposition approach for parameter-efficient fine-tuning, surpassing existing low-rank matrix and tensor methods in flexibility and compactness.
Findings
Reduces trainable parameters significantly.
Maintains performance comparable to full fine-tuning.
Effective across NLP and protein folding tasks.
Abstract
Low Rank Adaptation (LoRA) is a popular Parameter Efficient Fine Tuning (PEFT) method that effectively adapts large pre-trained models for downstream tasks. LoRA parameterizes model updates using low-rank matrices at each layer, significantly reducing the number of trainable parameters and, consequently, resource requirements during fine-tuning. However, the lower bound on the number of trainable parameters remains high due to the use of the low-rank matrix model. Recent works have addressed this limitation by proposing low rank tensor parameterizations for model updates. However, they only exploit redundancy across layers, or tensorize individual matrices using ad-hoc schemes that introduce additional hyperparameters. In this work, we propose a higher-order Candecomp/Parafac (CP) decomposition, enabling a more compact and flexible representation compared to existing matrix and tensor…
Peer Reviews
Decision·Submitted to ICLR 2025
Experimental results are very good considering the small rank (r=4 or r=8) of the update and, as a result, a very small number of parameters.
My major concern is the novelty of this idea. A very similar concept was previously implemented in "LoTR: Low Tensor Rank Weight Adaptation" by Daniel Bershatsky, Daria Cherniuk, Talgat Daulbaev, Aleksandr Mikhalev, and Ivan Oseledets (https://arxiv.org/abs/2402.01376). In LoTR, the same CP decomposition is applied to the tensor of stacked attention weights. The only difference is that, in LoTR, the tensor of all weights is consolidated into a three-dimensional tensor without separating the d
The method is presented clearly with comprehensive preliminaries and detailed methodology. The approach shows potential for significant parameter reduction in model fine-tuning.
While the idea is interesting, the evaluation is not comprehensive enough to demonstrate its practical utility. Moreover, the claims about its performance are exaggerated: - Line 021: "_[...] achieving a substantial reduction in the number of parameters while maintaining comparable performance_" - Line 076: "_[...] compared to state-of-the-art PEFT methods, with minimal performance trade-offs_" - Line 478: "_LoRTA achieves comparable and sometimes superior performance than baselines at a reduced
1. The paper writing is clear, and the method is simple and easy to reproduce. 2. The paper studied a valuable problem, and well solved the limitations of existing methods, achieving good results on multiple benchmarks.
1. As the author stated in the introduction, the main purpose of the Lorta method is to solve the problem of efficient finetune of LLM. Therefore, I expect to see more LLM-related results in the paper experiment. The author only conducted experiments on llama2-7B and mt-bench, and I expect to see more LLM results such as llama-3-70B, Mistral-7B, etc. 2. The author conducted experiments on multiple benchmarks, but there are few results on the mainstream LLM evaluation benchmarks. The GLUE benchm
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Speech Recognition and Synthesis · Topic Modeling
MethodsAdapter
