LoQT: Low-Rank Adapters for Quantized Pretraining
Sebastian Loeschcke, Mads Toftrup, Michael J. Kastoryano, Serge, Belongie, V\'esteinn Sn{\ae}bjarnarson

TL;DR
LoQT introduces a novel method combining low-rank adapters and quantization to enable efficient pretraining and fine-tuning of large language models on consumer hardware, reducing the need for sharding or offloading.
Contribution
The paper presents LoQT, a new approach that uses gradient-based tensor factorization for training quantized models, allowing large models to be trained on limited hardware.
Findings
Enables training of models up to 7B parameters on a 24GB GPU.
Demonstrates training of a 13B model with per-layer gradient updates on consumer hardware.
Shows effectiveness for both pretraining and downstream task adaptation.
Abstract
Despite advances using low-rank adapters and quantization, pretraining of large models on consumer hardware has not been possible without model sharding, offloading during training, or per-layer gradient updates. To address these limitations, we propose Low-Rank Adapters for Quantized Training (LoQT), a method for efficiently training quantized models. LoQT uses gradient-based tensor factorization to initialize low-rank trainable weight matrices that are periodically merged into quantized full-rank weight matrices. Our approach is suitable for both pretraining and fine-tuning models. We demonstrate this for language modeling and downstream task adaptation, finding that LoQT enables efficient training of models up to 7B parameters on a 24GB GPU. We also demonstrate the feasibility of training a 13B model using per-layer gradient updates on the same hardware.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications · Fault Detection and Control Systems
