LoQT: Low-Rank Adapters for Quantized Pretraining

Sebastian Loeschcke; Mads Toftrup; Michael J. Kastoryano; Serge; Belongie; V\'esteinn Sn{\ae}bjarnarson

arXiv:2405.16528·cs.LG·November 5, 2024

LoQT: Low-Rank Adapters for Quantized Pretraining

Sebastian Loeschcke, Mads Toftrup, Michael J. Kastoryano, Serge, Belongie, V\'esteinn Sn{\ae}bjarnarson

PDF

Open Access 1 Repo 1 Video

TL;DR

LoQT introduces a novel method combining low-rank adapters and quantization to enable efficient pretraining and fine-tuning of large language models on consumer hardware, reducing the need for sharding or offloading.

Contribution

The paper presents LoQT, a new approach that uses gradient-based tensor factorization for training quantized models, allowing large models to be trained on limited hardware.

Findings

01

Enables training of models up to 7B parameters on a 24GB GPU.

02

Demonstrates training of a 13B model with per-layer gradient updates on consumer hardware.

03

Shows effectiveness for both pretraining and downstream task adaptation.

Abstract

Despite advances using low-rank adapters and quantization, pretraining of large models on consumer hardware has not been possible without model sharding, offloading during training, or per-layer gradient updates. To address these limitations, we propose Low-Rank Adapters for Quantized Training (LoQT), a method for efficiently training quantized models. LoQT uses gradient-based tensor factorization to initialize low-rank trainable weight matrices that are periodically merged into quantized full-rank weight matrices. Our approach is suitable for both pretraining and fine-tuning models. We demonstrate this for language modeling and downstream task adaptation, finding that LoQT enables efficient training of models up to 7B parameters on a 24GB GPU. We also demonstrate the feasibility of training a 13B model using per-layer gradient updates on the same hardware.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sebulo/LoQT
pytorchOfficial

Videos

LoQT: Low-Rank Adapters for Quantized Pretraining· slideslive

Taxonomy

TopicsNeural Networks and Applications · Fault Detection and Control Systems