DoTA: Weight-Decomposed Tensor Adaptation for Large Language Models

Xiaolin Hu; Xiang Cheng; Peiyu Liu; Wei Liu; Jian Luan; Bin Wang; Yong; Liu

arXiv:2412.20891·cs.CL·December 31, 2024

DoTA: Weight-Decomposed Tensor Adaptation for Large Language Models

Xiaolin Hu, Xiang Cheng, Peiyu Liu, Wei Liu, Jian Luan, Bin Wang, Yong, Liu

PDF

Open Access

TL;DR

This paper introduces DoTA, a tensor decomposition-based method for efficient fine-tuning of large language models, which outperforms random initialization and supports quantization for reduced memory usage.

Contribution

It proposes Weight-Decomposed Tensor Adaptation (DoTA) using MPO decomposition for better initialization in LLM fine-tuning, and introduces QDoTA for quantized, memory-efficient adaptation.

Findings

01

DoTA outperforms random initialization in fine-tuning accuracy.

02

QDoTA achieves comparable performance with lower memory consumption.

03

Experiments demonstrate effectiveness on reasoning tasks.

Abstract

Low-rank adaptation (LoRA) reduces the computational and memory demands of fine-tuning large language models (LLMs) by approximating updates with low-rank matrices. However, low-rank approximation in two-dimensional space fails to capture high-dimensional structures within the target matrix. Recently, tensor decomposition methods have been explored for fine-tuning LLMs, leveraging their ability to extract structured information. Yet, these approaches primarily rely on random initialization, and the impact of initialization on tensor adaptation remains underexplored. In this paper, we reveal that random initialization significantly diverges from the validation loss achieved by full fine-tuning. To address this, we propose Weight-Decomposed Tensor Adaptation (DoTA), which leverages the Matrix Product Operator (MPO) decomposition of pre-trained weights for effective initialization in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTensor decomposition and applications · Topic Modeling