Gradient Weight-normalized Low-rank Projection for Efficient LLM Training
Jia-Hong Huang, Yixian Shen, Hongyi Zhu, Stevan Rudinac, Evangelos, Kanoulas

TL;DR
GradNormLoRP is a novel method that improves the efficiency of training large language models by normalizing weights and applying low-rank approximations, reducing memory usage while maintaining performance.
Contribution
It introduces GradNormLoRP, a technique that enhances parameter and memory efficiency for LLM training through weight normalization and low-rank projections, outperforming existing methods.
Findings
Reduces optimizer memory usage by up to 89.5%.
Enables pre-training of large LLMs on consumer GPUs.
Outperforms existing low-rank methods in fine-tuning tasks.
Abstract
Large Language Models (LLMs) have shown remarkable performance across various tasks, but the escalating demands on computational resources pose significant challenges, particularly in the extensive utilization of full fine-tuning for downstream tasks. To address this, parameter-efficient fine-tuning (PEFT) methods have been developed, but they often underperform compared to full fine-tuning and struggle with memory efficiency. In this work, we introduce Gradient Weight-Normalized Low-Rank Projection (GradNormLoRP), a novel approach that enhances both parameter and memory efficiency while maintaining comparable performance to full fine-tuning. GradNormLoRP normalizes the weight matrix to improve gradient conditioning, facilitating better convergence during optimization. Additionally, it applies low-rank approximations to the weight and gradient matrices, significantly reducing memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptical Systems and Laser Technology · Sparse and Compressive Sensing Techniques · Robotics and Sensor-Based Localization
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Attention Dropout · Linear Layer · Softmax · Dense Connections · Linear Warmup With Linear Decay · Dropout · WordPiece · Residual Connection
