GACT: Activation Compressed Training for Generic Network Architectures

Xiaoxuan Liu; Lianmin Zheng; Dequan Wang; Yukuo Cen; Weize Chen; Xu; Han; Jianfei Chen; Zhiyuan Liu; Jie Tang; Joey Gonzalez; Michael Mahoney,; Alvin Cheung

arXiv:2206.11357·cs.LG·September 7, 2022·5 cites

GACT: Activation Compressed Training for Generic Network Architectures

Xiaoxuan Liu, Lianmin Zheng, Dequan Wang, Yukuo Cen, Weize Chen, Xu, Han, Jianfei Chen, Zhiyuan Liu, Jie Tang, Joey Gonzalez, Michael Mahoney,, Alvin Cheung

PDF

Open Access 1 Repo

TL;DR

GACT is a versatile activation compression framework that enables memory-efficient training of various neural network architectures with theoretical convergence guarantees and minimal accuracy loss.

Contribution

GACT introduces a general activation compression method with proven convergence for diverse neural networks, supporting stable training with adaptive compression ratios.

Findings

01

Reduces activation memory by up to 8.1x across architectures

02

Enables training with 4.2x to 24.7x larger batch sizes

03

Maintains negligible accuracy loss during compression

Abstract

Training large neural network (NN) models requires extensive memory resources, and Activation Compressed Training (ACT) is a promising approach to reduce training memory footprint. This paper presents GACT, an ACT framework to support a broad range of machine learning tasks for generic NN architectures with limited domain knowledge. By analyzing a linearized version of ACT's approximate gradient, we prove the convergence of GACT without prior knowledge on operator type or model architecture. To make training stable, we propose an algorithm that decides the compression ratio for each tensor by estimating its impact on the gradient at run time. We implement GACT as a PyTorch library that readily applies to any NN architecture. GACT reduces the activation memory for convolutional NNs, transformers, and graph NNs by up to 8.1x, enabling training with a 4.2x to 24.7x larger batch size, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LiuXiaoxuanPKU/GACT-ICML
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Tensor decomposition and applications · Domain Adaptation and Few-Shot Learning

MethodsLib