Exploring Quantization for Efficient Pre-Training of Transformer   Language Models

Kamran Chitsaz; Quentin Fournier; Gon\c{c}alo Mordido; Sarath Chandar

arXiv:2407.11722·cs.LG·October 14, 2024·1 cites

Exploring Quantization for Efficient Pre-Training of Transformer Language Models

Kamran Chitsaz, Quentin Fournier, Gon\c{c}alo Mordido, Sarath Chandar

PDF

Open Access 1 Repo

TL;DR

This paper investigates the use of quantization techniques during the pre-training phase of Transformer language models to improve training efficiency without sacrificing performance.

Contribution

It systematically applies linear quantization to various components during pre-training and provides a comprehensive strategy for efficient Transformer pre-training.

Findings

01

Quantization can be effectively applied during pre-training without significant performance loss.

02

The proposed strategies improve training efficiency and stability.

03

Code implementation is publicly available for reproducibility.

Abstract

The increasing scale of Transformer models has led to an increase in their pre-training computational requirements. While quantization has proven to be effective after pre-training and during fine-tuning, applying quantization in Transformers during pre-training has remained largely unexplored at scale for language modeling. This study aims to explore the impact of quantization for efficient pre-training of Transformers, with a focus on linear layer components. By systematically applying straightforward linear quantization to weights, activations, gradients, and optimizer states, we assess its effects on model efficiency, stability, and performance during training. By offering a comprehensive recipe of effective quantization strategies to be applied during the pre-training of Transformers, we promote high training efficiency from scratch while retaining language modeling ability. Code…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chandar-lab/efficientllms
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsAttention Is All You Need · Residual Connection · Byte Pair Encoding · Layer Normalization · Focus · Label Smoothing · Linear Layer · Adam · Dropout · Multi-Head Attention