CoMERA: Computing- and Memory-Efficient Training via Rank-Adaptive   Tensor Optimization

Zi Yang; Ziyue Liu; Samridhi Choudhary; Xinfeng Xie; Cao Gao,; Siegfried Kunzmann; and Zheng Zhang

arXiv:2405.14377·cs.LG·December 3, 2024·1 cites

CoMERA: Computing- and Memory-Efficient Training via Rank-Adaptive Tensor Optimization

Zi Yang, Ziyue Liu, Samridhi Choudhary, Xinfeng Xie, Cao Gao,, Siegfried Kunzmann, and Zheng Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

CoMERA introduces a rank-adaptive tensor optimization method that significantly accelerates training and reduces memory usage for large AI models, making training more efficient and environmentally friendly.

Contribution

It presents a novel tensor optimization approach that improves training speed and memory efficiency for large models, outperforming existing methods like GaLore.

Findings

01

Achieves 2-3x speedup per epoch over standard training.

02

Outperforms GaLore in memory and computational efficiency.

03

Provides 4.23x compression ratio in pre-training.

Abstract

Training large AI models such as LLMs and DLRMs costs massive GPUs and computing time. The high training cost has become only affordable to big tech companies, meanwhile also causing increasing concerns about the environmental impact. This paper presents CoMERA, a Computing- and Memory-Efficient training method via Rank-Adaptive tensor optimization. CoMERA achieves rank-adaptive tensor-compressed (pre)-training via a multi-objective optimization formulation and improves the training to provide both a high compression ratio and excellent accuracy in the training process. Our optimized numerical computation (e.g., optimized tensorized embedding and tensor-network contractions) and GPU implementation eliminate part of the run-time overhead in the tensorized training on GPU. This leads to, for the first time, $2 - 3 \times$ speedup per training epoch compared with standard training. CoMERA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ziyangjoy/comera
pytorchOfficial

Videos

CoMERA: Computing- and Memory-Efficient Training via Rank-Adaptive Tensor Optimization· slideslive

Taxonomy

TopicsTensor decomposition and applications · Parallel Computing and Optimization Techniques