Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data   Flow and Per-Block Quantization

Haocheng Xi; Yuxiang Chen; Kang Zhao; Kai Jun Teh; Jianfei Chen; Jun; Zhu

arXiv:2403.12422·cs.LG·July 23, 2024·1 cites

Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization

Haocheng Xi, Yuxiang Chen, Kang Zhao, Kai Jun Teh, Jianfei Chen, Jun, Zhu

PDF

Open Access 1 Repo 1 Video

TL;DR

Jetfire introduces an INT8 transformer training method that significantly accelerates pretraining while maintaining accuracy, by optimizing memory access and quantization techniques.

Contribution

It proposes a novel INT8 data flow and per-block quantization approach tailored for transformers, improving speed and accuracy over existing methods.

Findings

01

Achieves comparable accuracy to FP16 training baseline.

02

Provides 1.42x training speedup for transformer blocks.

03

Reduces memory usage by 1.49x compared to FP16.

Abstract

Pretraining transformers are generally time-consuming. Fully quantized training (FQT) is a promising approach to speed up pretraining. However, most FQT methods adopt a quantize-compute-dequantize procedure, which often leads to suboptimal speedup and significant performance degradation when used in transformers due to the high memory access overheads and low-precision computations. In this work, we propose Jetfire, an efficient and accurate INT8 training method specific to transformers. Our method features an INT8 data flow to optimize memory access and a per-block quantization method to maintain the accuracy of pretrained transformers. Extensive experiments demonstrate that our INT8 FQT method achieves comparable accuracy to the FP16 training baseline and outperforms the existing INT8 training works for transformers. Moreover, for a standard transformer block, our method offers an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thu-ml/Jetfire-INT8Training
pytorchOfficial

Videos

Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization· slideslive

Taxonomy

TopicsNon-Destructive Testing Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings