GaLore$+$: Boosting Low-Rank Adaptation for LLMs with Cross-Head   Projection

Xutao Liao; Shaohui Li; Yuhui Xu; Zhi Li; Yu Liu; You He

arXiv:2412.19820·cs.CL·December 31, 2024

GaLore$+$: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection

Xutao Liao, Shaohui Li, Yuhui Xu, Zhi Li, Yu Liu, You He

PDF

Open Access

TL;DR

GaLore$+$ enhances low-rank adaptation for large language models by introducing cross-head projection and fast SVD, significantly reducing training time while maintaining superior performance.

Contribution

The paper introduces GaLore$+$, a novel method that reduces low-rank projection time in LLM fine-tuning through cross-head projection and randomized SVD techniques.

Findings

01

Achieves approximately 4x faster fine-tuning speed.

02

Delivers superior performance on reasoning and language generation tasks.

03

Effectively reduces low-rank projection time in LLM training.

Abstract

Recent low-rank training methods, such as GaLore, have significantly reduced the memory required to optimize large language models (LLMs). However, these methods often suffer from time-consuming low-rank projection estimations. In particular, the singular value decomposition (SVD) in GaLore can consume more than 80\% of the total training time. To address this issue, we propose GaLore $+$ , which uses cross-head low-rank projection to reduce the substantial time consumption in estimating low-rank projections for multi-head attention. In addition, we employ randomized subspace iteration to achieve fast SVD. To further enhance performance, we propose sparsely coded residuals to reduce the errors caused by low-rank approximation on the first- and second-order moments of the optimizers and weight updates. We evaluate GaLore $+$ on arithmetic reasoning and natural language generation datasets.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Machine Learning and Data Classification · Privacy-Preserving Technologies in Data

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings