LoRTA: Low Rank Tensor Adaptation of Large Language Models

Ignacio Hounie; Charilaos Kanatsoulis; Arnuv Tandon; Alejandro Ribeiro

arXiv:2410.04060·cs.CL·February 4, 2025

LoRTA: Low Rank Tensor Adaptation of Large Language Models

Ignacio Hounie, Charilaos Kanatsoulis, Arnuv Tandon, Alejandro Ribeiro

PDF

Open Access 3 Reviews

TL;DR

LoRTA introduces a higher-order tensor decomposition method for efficient fine-tuning of large language models, reducing parameters while maintaining performance across various benchmarks.

Contribution

This work proposes a novel CP tensor decomposition approach for parameter-efficient fine-tuning, surpassing existing low-rank matrix and tensor methods in flexibility and compactness.

Findings

01

Reduces trainable parameters significantly.

02

Maintains performance comparable to full fine-tuning.

03

Effective across NLP and protein folding tasks.

Abstract

Low Rank Adaptation (LoRA) is a popular Parameter Efficient Fine Tuning (PEFT) method that effectively adapts large pre-trained models for downstream tasks. LoRA parameterizes model updates using low-rank matrices at each layer, significantly reducing the number of trainable parameters and, consequently, resource requirements during fine-tuning. However, the lower bound on the number of trainable parameters remains high due to the use of the low-rank matrix model. Recent works have addressed this limitation by proposing low rank tensor parameterizations for model updates. However, they only exploit redundancy across layers, or tensorize individual matrices using ad-hoc schemes that introduce additional hyperparameters. In this work, we propose a higher-order Candecomp/Parafac (CP) decomposition, enabling a more compact and flexible representation compared to existing matrix and tensor…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 3Confidence 3

Strengths

Experimental results are very good considering the small rank (r=4 or r=8) of the update and, as a result, a very small number of parameters.

Weaknesses

My major concern is the novelty of this idea. A very similar concept was previously implemented in "LoTR: Low Tensor Rank Weight Adaptation" by Daniel Bershatsky, Daria Cherniuk, Talgat Daulbaev, Aleksandr Mikhalev, and Ivan Oseledets (https://arxiv.org/abs/2402.01376). In LoTR, the same CP decomposition is applied to the tensor of stacked attention weights. The only difference is that, in LoTR, the tensor of all weights is consolidated into a three-dimensional tensor without separating the d

Reviewer 02Rating 6Confidence 4

Strengths

The method is presented clearly with comprehensive preliminaries and detailed methodology. The approach shows potential for significant parameter reduction in model fine-tuning.

Weaknesses

While the idea is interesting, the evaluation is not comprehensive enough to demonstrate its practical utility. Moreover, the claims about its performance are exaggerated: - Line 021: "_[...] achieving a substantial reduction in the number of parameters while maintaining comparable performance_" - Line 076: "_[...] compared to state-of-the-art PEFT methods, with minimal performance trade-offs_" - Line 478: "_LoRTA achieves comparable and sometimes superior performance than baselines at a reduced

Reviewer 03Rating 5Confidence 3

Strengths

1. The paper writing is clear, and the method is simple and easy to reproduce. 2. The paper studied a valuable problem, and well solved the limitations of existing methods, achieving good results on multiple benchmarks.

Weaknesses

1. As the author stated in the introduction, the main purpose of the Lorta method is to solve the problem of efficient finetune of LLM. Therefore, I expect to see more LLM-related results in the paper experiment. The author only conducted experiments on llama2-7B and mt-bench, and I expect to see more LLM results such as llama-3-70B, Mistral-7B, etc. 2. The author conducted experiments on multiple benchmarks, but there are few results on the mainstream LLM evaluation benchmarks. The GLUE benchm

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Physics and Python Applications · Speech Recognition and Synthesis · Topic Modeling

MethodsAdapter