CoMERA: Computing- and Memory-Efficient Training via Rank-Adaptive Tensor Optimization
Zi Yang, Ziyue Liu, Samridhi Choudhary, Xinfeng Xie, Cao Gao,, Siegfried Kunzmann, and Zheng Zhang

TL;DR
CoMERA introduces a rank-adaptive tensor optimization method that significantly accelerates training and reduces memory usage for large AI models, making training more efficient and environmentally friendly.
Contribution
It presents a novel tensor optimization approach that improves training speed and memory efficiency for large models, outperforming existing methods like GaLore.
Findings
Achieves 2-3x speedup per epoch over standard training.
Outperforms GaLore in memory and computational efficiency.
Provides 4.23x compression ratio in pre-training.
Abstract
Training large AI models such as LLMs and DLRMs costs massive GPUs and computing time. The high training cost has become only affordable to big tech companies, meanwhile also causing increasing concerns about the environmental impact. This paper presents CoMERA, a Computing- and Memory-Efficient training method via Rank-Adaptive tensor optimization. CoMERA achieves rank-adaptive tensor-compressed (pre)-training via a multi-objective optimization formulation and improves the training to provide both a high compression ratio and excellent accuracy in the training process. Our optimized numerical computation (e.g., optimized tensorized embedding and tensor-network contractions) and GPU implementation eliminate part of the run-time overhead in the tensorized training on GPU. This leads to, for the first time, speedup per training epoch compared with standard training. CoMERA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTensor decomposition and applications · Parallel Computing and Optimization Techniques
